Performance stats of String#scan, strscan and a homemade approach

because I recently have messed around with a ruby syntax colorer,
I needed to know more about the performance of #scan or if there
were faster alternatives.. String#scan seems to be the fastest.

maybe this come others in handy.

···

--
Simon Strandgaard

bash-2.05b$ ruby h.rb
                          user system total real
String#scan 0.810000 0.020000 0.830000 ( 0.937981)
strscan 1.110000 0.040000 1.150000 ( 1.255724)
homemade slicer 2.420000 0.130000 2.550000 ( 2.648530)
true
true
bash-2.05b$ expand -t2 h.rb
require 'strscan'
def strscan(string, re)
  tokens = []
  ss = StringScanner.new(string)
  until ss.eos?
    m = ss.scan(re)
    break unless m
    tokens << m
  end
  tokens
end
def slicer(string, re)
  tokens = []
  while string.size > 0
    m = re.match(string)
    break unless m
    token = string.slice!(0, m.end(0))
    tokens << token
  end
  tokens
end
re_src = '\d+|\s+|.'
n = 10000
require 'benchmark'
Benchmark.bm(20) do |b|
  # Exercise String#scan
  re1 = Regexp.new(re_src)
  lines = IO.readlines(__FILE__)
  result1 = []
  GC.disable
  b.report("String#scan") do
    n.times do |i|
      result1 << lines[i%lines.size].scan(re1)
    end
  end
  GC.enable
  # Exercise strscan
  lines = IO.readlines(__FILE__)
  result2 = []
  GC.disable
  b.report("strscan") do
    n.times do |i|
      result2 << strscan(lines[i%lines.size], re1)
    end
  end
  GC.enable
  # Exercise homemade slicer
  re2 = Regexp.new('\A(?:'+re_src+')')
  lines = IO.readlines(__FILE__)
  result3 = []
  GC.disable
  b.report("homemade slicer") do
    n.times do |i|
      result3 << slicer(lines[i%lines.size].clone, re2)
    end
  end
  GC.enable
  # check that output was correct
  p((result1 == result2), (result1 == result3))
end
bash-2.05b$

Simon Strandgaard wrote:

because I recently have messed around with a ruby syntax colorer, I needed to know more about the performance of #scan or if there
were faster alternatives.. String#scan seems to be the fastest.

maybe this come others in handy.

--
Simon Strandgaard

bash-2.05b$ ruby h.rb
                          user system total real
String#scan 0.810000 0.020000 0.830000 ( 0.937981)
strscan 1.110000 0.040000 1.150000 ( 1.255724)
homemade slicer 2.420000 0.130000 2.550000 ( 2.648530)
true
bash-2.05b$ expand -t2 h.rb
require 'strscan'
def strscan(string, re)
  tokens =
  ss = StringScanner.new(string)
  until ss.eos?
    m = ss.scan(re)

There's no advantage in your case when using strscan. Try to store the positions of the tokens and not the scanned string itself (the token) => StringScanner#skip. But that might not work for you.

Regards,

   Michael