1.9 significantly slower than 1.8 on Mac

Derek_Chesterfield3 · 13 January 2008 00:15

I know 1.9.0 is still a work in progress, but I was expecting to see some benefit from YARV already. However, this simple script showing a typical bit of logfile processing takes over 70% LONGER under Ruby 1.9 on a Mac Pro:

$ cat logreport2.rb
require 'benchmark'
include Benchmark

logfile=<<END_OF_STRING
23:59:16 drop 10.14.241.252 >eth2c1 rule: 1015; rule_uid: {6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 172.25.20.79; dst: 10.14.65.137; proto: tcp; product: VPN-1 & FireWall-1; service: lotus; s_port: 57150;
23:59:15 drop 10.14.241.251 >eth2c1 rule: 1015; rule_uid: {6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 172.25.40.88; dst: 10.14.65.37; proto: tcp; product: VPN-1 & FireWall-1; service: TCP1801; s_port: 24511;
23:59:15 drop 10.14.241.251 >eth1c0 rule: 1015; rule_uid: {6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 10.14.163.59; dst: 10.24.0.2; proto: udp; product: VPN-1 & FireWall-1; service: domain-udp; s_port: udp1025-1027;
END_OF_STRING

bm(12) do |test|
  test.report('Simple:') do
    500000.times do
      src=Hash.new(0)
      logfile.each_line do |line|
        if (m=line.match /src: (.*?);/)
          src[m[1]]= src[m[1]] +1
        end
      end
    end
  end
end

$ ruby -v

ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]

$ ruby logreport2.rb

user system total real
Simple: 7.960000 0.000000 7.960000 ( 7.965567)

$ ruby19 -v

ruby 1.9.0 (2007-12-25 revision 14709) [i686-darwin9.1.0]

$ ruby19 logreport2.rb

···

user system total real
Simple: 13.750000 0.010000 13.760000 ( 13.754742)

Marcin_Miel_ynski · 13 January 2008 02:15

Derek Chesterfield pisze:

$ ruby -v

ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]

$ ruby logreport2.rb

user system total real
Simple: 7.960000 0.000000 7.960000 ( 7.965567)

$ ruby19 -v

ruby 1.9.0 (2007-12-25 revision 14709) [i686-darwin9.1.0]

$ ruby19 logreport2.rb

user system total real
Simple: 13.750000 0.010000 13.760000 ( 13.754742)

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
e.g: /src: (.*?);/ compiles down to:

[exact5:src: ]
[mem-start:1]
[jump:(1)]
[anychar]
[push:(-6)]
[mem-end:1]
[exact1:;]
[end]

seemingly simple match:

/src: (.*?);/ =~ "src: 172.25.20.79;"

turns out to:

    0> "src: 17..." [exact5:src: ]
    5> "172.25...." [mem-start:1]
    5> "172.25...." [jump:(1)]
    5> "172.25...." [push:(-6)]
    5> "172.25...." [mem-end:1]
    5> "172.25...." [exact1:;]
    5> "172.25...." [anychar]
    6> "72.25.2..." [push:(-6)]
    6> "72.25.2..." [mem-end:1]
    6> "72.25.2..." [exact1:;]
    6> "72.25.2..." [anychar]
    7> "2.25.20..." [push:(-6)]
    7> "2.25.20..." [mem-end:1]
    7> "2.25.20..." [exact1:;]
    7> "2.25.20..." [anychar]
    8> ".25.20...." [push:(-6)]
    8> ".25.20...." [mem-end:1]
    8> ".25.20...." [exact1:;]
    8> ".25.20...." [anychar]
    9> "25.20.7..." [push:(-6)]
    9> "25.20.7..." [mem-end:1]
    9> "25.20.7..." [exact1:;]
    9> "25.20.7..." [anychar]
   10> "5.20.79..." [push:(-6)]
   10> "5.20.79..." [mem-end:1]
   10> "5.20.79..." [exact1:;]
   10> "5.20.79..." [anychar]
   11> ".20.79;" [push:(-6)]
   11> ".20.79;" [mem-end:1]
   11> ".20.79;" [exact1:;]
   11> ".20.79;" [anychar]
   12> "20.79;" [push:(-6)]
   12> "20.79;" [mem-end:1]
   12> "20.79;" [exact1:;]
   12> "20.79;" [anychar]
   13> "0.79;" [push:(-6)]
   13> "0.79;" [mem-end:1]
   13> "0.79;" [exact1:;]
   13> "0.79;" [anychar]
   14> ".79;" [push:(-6)]
   14> ".79;" [mem-end:1]
   14> ".79;" [exact1:;]
   14> ".79;" [anychar]
   15> "79;" [push:(-6)]
   15> "79;" [mem-end:1]
   15> "79;" [exact1:;]
   15> "79;" [anychar]
   16> "9;" [push:(-6)]
   16> "9;" [mem-end:1]
   16> "9;" [exact1:;]
   16> "9;" [anychar]
   17> ";" [push:(-6)]
   17> ";" [mem-end:1]
   17> ";" [exact1:;]
   18> "" [end]
match at 0

The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction which uses 'enclen' and 'is_mbc_newline' (which are function pointers, so impossible to be inlined).

Joni goes a bit further and uses a bunch of singlebyte specialized opcodes when it detects singlebyte encoding being used.

lopex

Charles_Oliver_Nutte · 13 January 2008 07:32

Marcin Mielżyński wrote:

The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction which uses 'enclen' and 'is_mbc_newline' (which are function pointers, so impossible to be inlined).

Joni goes a bit further and uses a bunch of singlebyte specialized opcodes when it detects singlebyte encoding being used.

FYI, for those not following it, Joni is the new regular expression engine for JRuby. It is a port of Oniguruma, the regexp engine for Ruby 1.9.

- Charlie

Derek_Chesterfield3 · 13 January 2008 14:45

So Ruby 1.9.0's Regexp is slower than 1.8's?
Damn - All my Ruby scripts make extensive use of Regexp.
Will this always be the case, or might 1.9.1 be worth me upgrading to?

···

On 13 Jan 2008, at 02:15, Marcin Mielżyński wrote:

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

Thomas_Hurst · 14 January 2008 17:03

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

So Ruby 1.9.0's Regexp is slower than 1.8's?

In some cases, probably; it does quite a bit more, not just in encoding
support, but more regexp features. e.g, from a quick glance through
サービス終了のお知らせ :

Zero-width lookbehind assertions (/(?<=foo)bar/ =~ 'foobar' # $& => "bar")

Named captures (/(?<foo>bar))/ =~ 'foobar' # $~[:foo] => "bar")

Fancier backreferences (/(?<foo>foo|bar)\k<foo>/ =~ "foobar" # => nil,
.. =~ 'foofoo' # $& => "foofoo")

The ability to call subexpressions
(/(?<foo>foo|bar)\g<foo>/ =~ "foobar" # $& => "foobar", $2 => bar)

You might consider installing a separate Ruby 1.8 with Oniguruma patched
in, or installing the Oniguruma gem, so you can test the performance
without bringing YARV into the equation.

Damn - All my Ruby scripts make extensive use of Regexp. Will this
always be the case, or might 1.9.1 be worth me upgrading to?

Oniguruma's pretty mature by now, I wouldn't expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.

I wouldn't be too worried; I've used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.

···

* Derek Chesterfield (groups@dezuk.com) wrote:

On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:

--
Thomas 'Freaky' Hurst
http://hur.st/

Marcin_Miel_ynski · 14 January 2008 18:35

Thomas Hurst pisze:

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

So Ruby 1.9.0's Regexp is slower than 1.8's?

In some cases, probably; it does quite a bit more, not just in encoding
support, but more regexp features. e.g, from a quick glance through
サービス終了のお知らせ :

When those features are not used they don't affect performance at all (for matching). The main reasons of bottlenecks are the encoding agnostic functions and main switch execution instead of inner loops in some cases.

Oniguruma's pretty mature by now, I wouldn't expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.

Joni jvm bytecode compiler is on the way (now it uses int array for the compiled code), it should greatly surpass Oniguruma performance once it's finished (it already does for some patterns even without the asm compiler).

lopex

···

* Derek Chesterfield (groups@dezuk.com) wrote:

On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:

Derek_Chesterfield3 · 15 January 2008 15:15

In your usage maybe. But in my experiments with 1.9, ALL my scripts [all of them using Regexp extensively to parse files] are a *minimum* of 40% slower, and the simple script in my original post takes 70% more time to run.

···

On 14 Jan 2008, at 17:03, Thomas Hurst wrote:

* Derek Chesterfield (groups@dezuk.com) wrote:

On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

So Ruby 1.9.0's Regexp is slower than 1.8's?

I wouldn't be too worried; I've used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.

Topic		Replies	Views
1.9 significantly slower than 1.8 on Mac ruby-talk	2	124	15 January 2008
YARV speed test: XML processing ruby-talk	5	107	27 May 2006
Ruby 2.4.0-preview1 almost 20% faster than 2.3.1 ruby-talk	2	240	6 July 2016
YARV benchmarking ruby-talk	1	71	23 February 2006
Performance test: 1.8.0p2 versus 1.6.8 ruby-talk	1	119	22 April 2003

1.9 significantly slower than 1.8 on Mac

Related topics