Regexp-engine: ruby vs. perl

Hello list,

I've got a question about ruby's regexp-engine: I'm wondering why ruby's
regexp-engine is soo much slower than perl's.

My test file looks like this (status.dat from nagios):

<status-type> {
    key=value
    ... ~20 further key=value pairs
}

This file's size is about 100MB.

[perl -- v5.8.8]
time perl -wnl -00 -e 'print if /host_name=monslave\d+/ and
/service_description=load/ and /servicestatus\s+{[^}]+}/m'
/tmp/status.dat >/dev/null
perl -wnl -00 -e /tmp/status.dat > /dev/null 0.90s user 0.11s system
51% cpu 1.946 total

[ruby19 -- ruby 1.9.1p129 (2009-05-12 revision 23412) [i686-linux]]
time ruby19 -wnl -00 -e 'print if /host_name=monslave\d+/ and
/service_description=load/ and /servicestatus\s+{[^}]+}/m'
/tmp/status.dat >/dev/null
ruby19 -wnl -00 -e /tmp/status.dat > /dev/null 5.13s user 0.15s system
50% cpu 10.449 total

[ruby18 -- ruby 1.8.7p5000 (2009-02-19) [i686-linux]]
time ruby18 -wnl -00 -e 'print if /host_name=monslave\d+/ and
/service_description=load/ and /servicestatus\s+\{[^}]+\}/m'
/tmp/status.dat >/dev/null
ruby18 -wnl -00 -e /tmp/status.dat > /dev/null 3.93s user 0.05s system
48% cpu 8.153 total

So, both versions of ruby are slower than perl and I'm wondering why.

I'd like to integrate ruby in my daily work (it's actually a
wonderful/beatiful language) it's hard to justify when things like the
trivial regexp above is about a factor of 4-5 slower than in perl.
And writing/using regexps is part of my daily work.

Thanks

- --
Freundliche Grüße / Kind regards

Axel Schmalowsky
Platform Engineer

···

___________________________________

domainfactory GmbH
Oskar-Messter-Str. 33
85737 Ismaning
Germany

Mobil: +49 (0)176 / 10246727
Telefon: +49 (0)89 / 55266-356
Telefax: +49 (0)89 / 55266-222

E-Mail: aschmalowsky@df.eu
Internet: www.df.eu

Registergericht: Amtsgericht München
HRB-Nummer 150294, Geschäftsführer:
Tobia Sara Marburg, Jochen Tuchbreiter

Just guessing here, but usually when regexes are slow it's because of
backtracking. Since it looks like you don't need any backtracking in
this little script, you might try throwing in some (?> ) around your
repetitions. (And yes, perl doesn't require this hack to be fast.
Perl's probably applying it for you automatically... perl's regex is
smarter than ruby's; what can I say?) HTH

···

On 7/6/09, Axel Schmalowsky <aschmalowsky@df.eu> wrote:

I've got a question about ruby's regexp-engine: I'm wondering why ruby's
regexp-engine is soo much slower than perl's.