Question on speed

Based on the following snippet:

File.open(name).each { |line|
   case line
   when /not found/, /^[gBbTi]/
      next
   when /^S/
      # start or stop condition
   when /^I/
      # iteration indicator
   else
      # actual data
   end
}

In the first when, (/not found/, /^[gBbTi]/) is there any benefit in
setting up the statement in either of the following ways?

when /^[gBbTi]/, /not found/
   Put the short circuit one first (expected to happen more often than
'not found')

OR

when /^g/, /^B/, /^b/, /^T/, /^i/, /not found/
   Separate each operation into it's own portion of the test (ordered
by expected frequency)

I guess this is really a question of 'how does ruby handle the multiple
arguments to a case statement?'.
Are they taken in order or all compiled to a single expression before
ever being evaluated?

Thanks for any insight.

Ruby doesn't compile the separate expressions: it sends ===(subject)
to each argument in turn. They don't have to be regular expressions,
after all. Putting the more frequent match first is indeed quicker, as
expected - but that's an optimisation that can only be made with
foreknowledge of the data set.

Separating each operation into its own argument is significantly
slower due to the Ruby method call overhead. Knowing this, however, we
can derive a further optimisation by combining the two regular
expressions together: /^[gBbTi]|not found/

Paul.

PS: I did a bit of quick unscientific profiling to check. Here's the code:

data = [
  "goat",
  "Badger",
  "bear",
  "Tiger",
  "ibis",
  "not found",
  "Start",
  "Data goes here"
] * 10000

t0 = Time.now
data.each do |line|
  case line
  when /not found/, /^[gBbTi]/
    next
  end
end
p Time.now - t0 # 0.236658

t0 = Time.now
data.each do |line|
  case line
  when /^[gBbTi]/, /not found/
    next
  end
end
p Time.now - t0 # 0.176375

t0 = Time.now
data.each do |line|
  case line
  when /^[gBbTi]|not found/
    next
  end
end
p Time.now - t0 # 0.145403

t0 = Time.now
data.each do |line|
  case line
  when /^g/, /^B/, /^b/, /^T/, /^i/, /not found/
    next
  end
end
p Time.now - t0 # 0.299182

···

On 18/08/06, L7 <jesse.r.brown@gmail.com> wrote:

In the first when, (/not found/, /^[gBbTi]/) is there any benefit in
setting up the statement in either of the following ways?

Paul Battley wrote:

···

On 18/08/06, L7 <jesse.r.brown@gmail.com> wrote:
> In the first when, (/not found/, /^[gBbTi]/) is there any benefit in
> setting up the statement in either of the following ways?

Ruby doesn't compile the separate expressions: it sends ===(subject)
to each argument in turn. They don't have to be regular expressions,
after all. Putting the more frequent match first is indeed quicker, as
expected - but that's an optimisation that can only be made with
foreknowledge of the data set.

Separating each operation into its own argument is significantly
slower due to the Ruby method call overhead. Knowing this, however, we
can derive a further optimisation by combining the two regular
expressions together: /^[gBbTi]|not found/

Paul.

PS: I did a bit of quick unscientific profiling to check. Here's the code:

data = [
  "goat",
  "Badger",
  "bear",
  "Tiger",
  "ibis",
  "not found",
  "Start",
  "Data goes here"
] * 10000

t0 = Time.now
data.each do |line|
  case line
  when /not found/, /^[gBbTi]/
    next
  end
end
p Time.now - t0 # 0.236658

t0 = Time.now
data.each do |line|
  case line
  when /^[gBbTi]/, /not found/
    next
  end
end
p Time.now - t0 # 0.176375

t0 = Time.now
data.each do |line|
  case line
  when /^[gBbTi]|not found/
    next
  end
end
p Time.now - t0 # 0.145403

t0 = Time.now
data.each do |line|
  case line
  when /^g/, /^B/, /^b/, /^T/, /^i/, /not found/
    next
  end
end
p Time.now - t0 # 0.299182

Thank you for the explanation and example (it's plenty 'un'scientific
for my needs).

Paul Battley wrote:

p Time.now - t0 # 0.236658

Shh. Don't tell anyone, but there's the benchmark module 'bm'. Pickaxe has a good introduction on how to use it, and it prints pwetty result tables. And comes with a convenient method to do a dry and "real" run to let GC have a chance to kick in.

David Vallner