Hello –
[This comes from some peripheral playing around having to do with the
various String#each threads, but (I promise! it’s not directly on
that topic.]
This is something I’ve been discussing and investigating on #ruby-lang
with Martin Chase, Holden Glova, Michael Granger.
According to the docs I’ve seen, String#split can take either a string
or a regex as the separator/delimiter argument. However – very
surprisingly to me – it turns out that if you provide a string:
str.split(aString) …
and if aString is longer than one character, then aString is
automatically converted to a regex. Examples:
One-char strings, treated as strings:
irb(main):001:0> "abc.+def".split("e")
["abc.+d", "f"]
irb(main):002:0> "abc.+def".split(".")
["abc", "+def"]
strings of >1 char, converted to regexes (!)
irb(main):003:0> "abc.+def".split(".e")
["abc.+", "f"]
irb(main):004:0> "abc.+def".split(".+")
[]
This means also that strings without any regex special characters are
really “splitting on a string” only by coincidence. They’re really
splitting on a regex which happens to provide the results one would
have expected from splitting on a string. Thus, for example:
irb(main):003:0> “here there and everywhere”.split(“er”)
[“h”, “e th”, “e and ev”, “ywh”, “e”]
is really treating the string arg as a regex, as shown by:
irb(main):005:0> “here there and everywhere”.split(".r")
[“h”, “e th”, “e and ev”, “ywh”, “e”]
producing the same results.
Any insights on why #split does this? I found it quite surprising
when I discovered it, and I don’t know of anywhere where it’s
documented as working this way.
David
···
–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav