On a whim, I just decided to try an experiment with regexps, to see how
they perform in two slightly different cases. I wanted to see how using
a single regexp object for many many evaluations performed compared to
using the regexp within the loop.
The scripts I wrote searched through a words file that is 234937 lines
long.
Here's the scripts I wrote, to clarify:
First one:
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ /[a-df-h][aeiou]{2}/
}
}
puts total
Second one:
rexp = /[a-df-h][aeiou]{2}/
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ rexp
}
}
puts total
I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.
It's just a curiosity, but does anyone know what might cause them to be
'backwards' like that?
"If you've got a 5000-line JSP page that has "all in one" support
for three input forms and four follow-up screens, all controlled
by "if" statements in scriptlets, well ... please don't show it
to me :-). Its almost dinner time, and I don't want to lose my
appetite :-)."
- Craig R. McClanahan
On a whim, I just decided to try an experiment with regexps, to see
how
they perform in two slightly different cases. I wanted to see how
using
a single regexp object for many many evaluations performed compared
to
using the regexp within the loop.
The scripts I wrote searched through a words file that is 234937
lines
long.
Here's the scripts I wrote, to clarify:
First one:
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ /[a-df-h][aeiou]{2}/
}
}
puts total
Second one:
rexp = /[a-df-h][aeiou]{2}/
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ rexp
}
}
puts total
I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.
It's just a curiosity, but does anyone know what might cause them to
be
'backwards' like that?
I'll wager a guess. In the first version Ruby knows that
'/[a-df-h][aeiou]{2}/' is a regexp. In the second one Ruby doesn't
know if 'rexp' is a variable or method, so it has to do 1 maybe 2 look
ups on every interation before it dispatches String#=~.
Also regexp's are immutable so Ruby doesn't allocate a new regexp on
every interation and storing the regexp has no effect in that regard.
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ /[a-df-h][aeiou]{2}/
^^^^ inline regexp (part of the AST)
}
}
puts total
Second one:
rexp = /[a-df-h][aeiou]{2}/
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ rexp
^^^^ variable lookup
}
}
puts total
I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.
It's just a curiosity, but does anyone know what might cause them to be
'backwards' like that?
Inline regexps are much faster than a variable lookup then using the methods on the Regexp object.
Basically, the inline regex avoids the lvar lookup and the call and shoots straight into a match3 node. The lvar is probably not _that_ expensive, but method dispatch is not terribly cheap.
···
On Feb 15, 2005, at 5:16 PM, Derek Lewis wrote:
I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.
It's just a curiosity, but does anyone know what might cause them to be
'backwards' like that?
"Derek Lewis" <lewisd@f00f.net> schrieb im Newsbeitrag
news:20050216012200.GP23232@f00f.net...
On a whim, I just decided to try an experiment with regexps, to see how
they perform in two slightly different cases. I wanted to see how using
a single regexp object for many many evaluations performed compared to
using the regexp within the loop.
The scripts I wrote searched through a words file that is 234937 lines
long.
Here's the scripts I wrote, to clarify:
First one:
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ /[a-df-h][aeiou]{2}/
}
}
puts total
Second one:
rexp = /[a-df-h][aeiou]{2}/
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ rexp
}
}
puts total
I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.
It's just a curiosity, but does anyone know what might cause them to be
'backwards' like that?
Did you try the same with the matching reversed, i.e., "rexp =~ word"
instead of "word =~ rexp"? Did it make a difference?
Like the original poster, I found the behavior counterintuitive. Perhaps
this is because our assumptions come from the C model of the universe,
where more local variables is typically faster, and method dispatch is
not a problem.
I wonder what the merits of collecting equivalences like these to form
some kind of post-hoc parse-tree optimization would be. Probably not
great, but it might be fun.
"If you've got a 5000-line JSP page that has "all in one" support
for three input forms and four follow-up screens, all controlled
by "if" statements in scriptlets, well ... please don't show it
to me :-). Its almost dinner time, and I don't want to lose my
appetite :-)."
- Craig R. McClanahan