Robert Klemme wrote:
Just one note: you can use regexp grouping to get the delimiters as well
as the content. That way you can join with "" and get the original back:
I had the same thought, based on the similar Perl feature.
"abababab".split(/b/)
=> ["a", "a", "a", "a"]
"abababab".split(/(b)/)
=> ["a", "b", "a", "b", "a", "b", "a", "b"]
There might be certain border cases though.
a = "aba".split(/(a)/)
=> ["", "a", "b", "a"]
"abcdabcd".split(/(a|b)/)
=> ["", "a", "", "b", "cd", "a", "", "b", "cd"]
The leading "" looks like a bug at first, but it seems the intention is to always put delimited text at even array indices, leaving delimiters at odd indices. This is good to know, but not well documented for Ruby.
The Perl doc goes into this in some depth. If Perl was the source of inspiration, then here is a crack.
"abcdabcd".split(/(a)|(b)/)
=> ["", "a", "", "b", "cd", "a", "", "b", "cd"]
The (admittedly obscure) example above is handled differently in Perl.
@a = split(/(a)|(b)/, "abcdabcd");
map { print "\"$_\"", ' ' } @a;
produces:
"" "a" "" "" "" "b" "cd" "a" "" "" "" "b" "cd"
Sadly, it can be difficult to distinguish between undef and "" in Perl, but this output can be better understood as:
""
"a" undef
""
undef "b"
"cd"
"a" undef
""
undef "b"
"cd"
I won't even try to explain, much less justify, this behavior in Perl.
···
--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>