String#split converts string args to regexes --?

i’m sorry, matz, but even understading what you told me about how it
works, it just dosen’t add up. take a look. these all do the exact same
thing:

given: s = ‘xyz’ * 10000000 << ‘b’ << ‘xyz’ * 10000000

i = s.index(/b/)
a = [s[0…i], s[i+1…-1]]
~1.25s cumulative

i = s.index(‘b’)
a = [s[0…i], s[i+1…-1]]
~2.30s cumulative

a = s.split(/b/)
~1.58s cumulative

a = s.split(‘b’)
~1.13s cumulative

one assumes that split “partakes” of index, so to speak, in that it must
find those matches just as index does before it can split on them. so,
split should naturally be a tad slower then index in a direct race. but
here we have taken the extra step to split the string on the index
ourselves, which one would think would make up for this difference and
then some, being implemented in ruby rather then relying on the
underlying interpreter. surprisingly our index regexp version is faster
than the split regexp version. a bit odd. but even more surprising, our
string index version isn’t even in the ballpark of the string split
version. thus, something does not jive. and it’s this diviation i am
wondering about, and how it might point to a way to improve ruby’s
performance.

~transami

···

On Wed, 2002-07-10 at 12:01, Yukihiro Matsumoto wrote:

Hi,

In message “Re: String#split converts string args to regexes – ?” > on 02/07/11, Tom Sawyer transami@transami.net writes:

but i am suprised with my #index results: it is about 4x as fast with a
regexp versus a string.

how is this possible? split is around 4x times faster with a string, but
index is 4x as slow?

Remember String#split only searches for one character length string,
so that “split” only need to search and match the first byte for the
case.

regex match uses Boyer Moore search for exact string match (if it’s
possible), whereas string match uses simple linear search.

  					matz.


~transami

“They that can give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety.”
– Benjamin Franklin