Is ruby's regex slower?

Jean_G · 4 January 2010 08:54

Hi,

I wrote this message without other purpose, just show a result for
comparison.

First I got the page which will be used for analysis (got all domain
names from it):

wget http://www.265.com/Kexue_Jishu/

It will get an index.html page.

Then I run this ruby script:

#!/usr/bin/ruby

f = File.open("index.html")

f.each_line do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end

f.close

And this perl script:

#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;

When using "time" command to see the running time, I saw ruby is
slower than perl (maybe due to the regex?).

Ruby's:

real 0m0.013s
user 0m0.012s
sys 0m0.000s

Perl's:

real 0m0.004s
user 0m0.004s
sys 0m0.000s

Both versions:

# ruby -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

# perl -v
This is perl, v5.8.8 built for i486-linux-thread-multi

Yes that's the result, but not influence me to love ruby.

Thanks.
Jenn.

Ayumu_Aizawa · 4 January 2010 09:07

Hi Jenn.

Its interested

How's it?

#!/usr/bin/ruby

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
  f.each_line do |c|
    puts $1 if c =~ regex
  end
end

···

2010/1/4 Ruby Newbee <rubynewbee@gmail.com>:

Hi,

I wrote this message without other purpose, just show a result for
comparison.

First I got the page which will be used for analysis (got all domain
names from it):

wget 科技 - 265上网导航

It will get an index.html page.

Then I run this ruby script:

#!/usr/bin/ruby

f = File.open("index.html")

f.each_line do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end

f.close

And this perl script:

#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;

When using "time" command to see the running time, I saw ruby is
slower than perl (maybe due to the regex?).

Ruby's:

real 0m0.013s
user 0m0.012s
sys 0m0.000s

Perl's:

real 0m0.004s
user 0m0.004s
sys 0m0.000s

Both versions:

# ruby -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

# perl -v
This is perl, v5.8.8 built for i486-linux-thread-multi

Yes that's the result, but not influence me to love ruby.

Thanks.
Jenn.

Josh_Cheek · 4 January 2010 09:20

It seems like most of the time would be spent loading the environment and
printing the output, making it difficult to compare regexp speeds.

Anyway, just wanted to say the Ruby one can be done in a more succinct
syntax:

File.open("index.html").each do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end

···

On Mon, Jan 4, 2010 at 2:54 AM, Ruby Newbee <rubynewbee@gmail.com> wrote:

Hi,

I wrote this message without other purpose, just show a result for
comparison.

First I got the page which will be used for analysis (got all domain
names from it):

wget 科技 - 265上网导航

It will get an index.html page.

Then I run this ruby script:

#!/usr/bin/ruby

f = File.open("index.html")

f.each_line do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end

f.close

And this perl script:

#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;

When using "time" command to see the running time, I saw ruby is
slower than perl (maybe due to the regex?).

Ruby's:

real 0m0.013s
user 0m0.012s
sys 0m0.000s

Perl's:

real 0m0.004s
user 0m0.004s
sys 0m0.000s

Both versions:

# ruby -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

# perl -v
This is perl, v5.8.8 built for i486-linux-thread-multi

Yes that's the result, but not influence me to love ruby.

Thanks.
Jenn.

Jean_G · 4 January 2010 09:22

Thanks for the reminding, I got your meanings.
This time I used a compiled regex for both ruby and perl, the speed is
still different:

# cat regex_compile.rb
#!/usr/bin/ruby

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
  f.each_line do |c|
    puts $1 if c =~ regex
  end
end

# cat regex_compile.pl
#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/o;
}
close HD;

# time ruby regex_compile.rb > /dev/null

real 0m0.011s
user 0m0.008s
sys 0m0.004s

# time perl regex_compile.pl > /dev/null

real 0m0.003s
user 0m0.000s
sys 0m0.000s

···

On Mon, Jan 4, 2010 at 5:07 PM, Ayumu Aizawa <ayumu.aizawa@gmail.com> wrote:

Hi Jenn.

Its interested

How's it?

Wybo_Dekker · 4 January 2010 09:44

It seems like most of the time would be spent loading the environment and
printing the output, making it difficult to compare regexp speeds.

Sure; so why not do it 1000 times:

#!/usr/bin/ruby
1000.times do
   File.open("index.html").each do |c|
     puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
   end
end

time ./test.rb >/tmp/t
elap 6.511 user 6.336 syst 0.136 CPU 99.40%

#!/usr/bin/perl
for ($i=0; $i<1000; $i+=1) {
   open HD,"index.html" or die $!;
   while(<HD>) {
     print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
   }
   close HD;
}

time ./test.pl >/tmp/t
elap 0.864 user 0.844 syst 0.020 CPU 100.04%

So perl is 7 or 8 times faster here.

···

--
Wybo

W_James · 4 January 2010 10:50

Ayumu Aizawa wrote:

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
  f.each_line do |c|
    puts $1 if c =~ regex
  end
end

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
IO.foreach("index.html"){|line| puts $1 if line =~ regex }

With no looping:

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
puts IO.readlines("index.html").map{|s| s[ regex, 1 ] }.compact

···

--

Robert_K1 · 4 January 2010 13:15

"Compiling" regular expression does not bring any advantages. In fact, usually it's slower than using the regular expression inline as you did in your first example. The Ruby interpreter optimizes this already.

If the speed difference does not bother it why bother discussing it?

Btw, I'd probably formulate the regexp differently in order to avoid ".*?" which could be slow. Also, if you have a lot of slashes in the regexp the %r form comes in handy because you do not need all the escapes:

File.foreach "index.html" do |line|
puts $1 if %r{href="http://([^“/])/[^"]”\s+target="_blank"} =~ line
end

Kind regards

robert

···

On 01/04/2010 10:22 AM, Ruby Newbee wrote:

On Mon, Jan 4, 2010 at 5:07 PM, Ayumu Aizawa <ayumu.aizawa@gmail.com> wrote:

Hi Jenn.

Its interested

How's it?

Thanks for the reminding, I got your meanings.
This time I used a compiled regex for both ruby and perl, the speed is
still different:

# cat regex_compile.rb
#!/usr/bin/ruby

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
  f.each_line do |c|
    puts $1 if c =~ regex
  end
end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Rilindo_Foster · 4 January 2010 16:50

Wait, you are parsing HTML with regex?

I need to post this, then:

http://www.codinghorror.com/blog/archives/001311.html

···

On Jan 4, 2010, at 5:50 AM, W. James wrote:

Ayumu Aizawa wrote:

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
f.each_line do |c|
puts $1 if c =~ regex
end
end

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
IO.foreach("index.html"){|line| puts $1 if line =~ regex }

With no looping:

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
puts IO.readlines("index.html").map{|s| s[ regex, 1 ] }.compact

--

Roger_Pack4 · 4 January 2010 20:59

time ./test.rb >/tmp/t
elap 6.511 user 6.336 syst 0.136 CPU 99.40%

time ./test.pl >/tmp/t
elap 0.864 user 0.844 syst 0.020 CPU 100.04%

So perl is 7 or 8 times faster here.

You could try ruby 1.9 and see if it helps the speed.

If slow regex is a big problem for you I could probably hack up a gem
that wraps PCRE or what not.
-r

···

--
Posted via http://www.ruby-forum.com/\.

Marnen_Laibow-Koser · 4 January 2010 21:16

Roger Pack wrote:

time ./test.rb >/tmp/t
elap 6.511 user 6.336 syst 0.136 CPU 99.40%

time ./test.pl >/tmp/t
elap 0.864 user 0.844 syst 0.020 CPU 100.04%

So perl is 7 or 8 times faster here.

You could try ruby 1.9 and see if it helps the speed.

If slow regex is a big problem for you I could probably hack up a gem
that wraps PCRE or what not.

Better yet, there's Oniguruma.

-r

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Kyle_Schmitt · 4 January 2010 22:14

Thank you. It's been far too long since I've read Coding Horror.

Although it reminds me ,I should bug one of my PhD candidate friends
for some perl code I counseled him to fix. He was parsing a 500MB+
csv file with getlines and string compares and splits in perl..... I
think he literally banged his head on the table when I introduced him
to CPAN and showed him CSV libraries...

···

On Mon, Jan 4, 2010 at 10:50 AM, Rilindo Foster <rilindo@gmail.com> wrote:

Wait, you are parsing HTML with regex?

I need to post this, then:

http://www.codinghorror.com/blog/archives/001311.html

Kornelius_Kalnbach · 5 January 2010 05:26

Roger Pack wrote:

So perl is 7 or 8 times faster here.

You could try ruby 1.9 and see if it helps the speed.

not very.

I get best results in Ruby with:

regexp = %r{href="http://([^“/])/[^"]”\s+target="_blank"}
1000.times do
puts File.read('index.html').scan(regexp)
end

~/ruby/bench time ruby19 regex.rb > /dev/null
real 0m1.428s
user 0m1.359s
sys 0m0.056s

~/ruby/bench time perl5.10.0 regex.pl > /dev/null
real 0m1.189s
user 0m1.095s
sys 0m0.084s

It's still slower. Perl has regular expression magic beyond my
imagination, though. I heard they take the most "rare" character in the
literal part of the regex (let's say, the colon) and search for it using
machine code, and then work their way backwards to the beginning of the
regexp...

Say what you want, but Perl rocks when it comes to text processing
speed.

Python is even faster:

import re
regexp = re.compile(r'href="http://([^“/])/[^"]”\s+target="_blank"')
for i in xrange(1000):
     with open("index.html") as f:
         for m in regexp.finditer(f.read()):
             print m.group(1)

time python2.6 regex.py > /dev/null
real 0m0.943s
user 0m0.880s
sys 0m0.053s

···

--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 7 January 2010 15:01

So perl is 7 or 8 times faster here.

You could try ruby 1.9 and see if it helps the speed.

You could also try
jruby --fast [+= --1.9]
-r

···

--
Posted via http://www.ruby-forum.com/\.

Marnen_Laibow-Koser · 5 January 2010 11:37

Kornelius Kalnbach wrote:
[...]

It's still slower. Perl has regular expression magic beyond my
imagination, though. I heard they take the most "rare" character in the
literal part of the regex (let's say, the colon) and search for it using
machine code, and then work their way backwards to the beginning of the
regexp...

I think that's only done when study is called, but I could be wrong.

Say what you want, but Perl rocks when it comes to text processing
speed.

Python is even faster:

import re
regexp = re.compile(r'href="http://([^“/])/[^"]”\s+target="_blank"')
for i in xrange(1000):
     with open("index.html") as f:
         for m in regexp.finditer(f.read()):
             print m.group(1)

time python2.6 regex.py > /dev/null
real 0m0.943s
user 0m0.880s
sys 0m0.053s

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Charles_Nutter · 9 January 2010 00:43

And --server, and as recent a JVM version as you can get

- Charlie

···

On Thu, Jan 7, 2010 at 9:01 AM, Roger Pack <rogerpack2005@gmail.com> wrote:

You could also try
jruby --fast [+= --1.9]

Kornelius_Kalnbach · 5 January 2010 11:50

Marnen Laibow-Koser wrote:

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...

You can improve it then

[murphy]

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 5 January 2010 12:55

The question is: does it matter for most practical purposes - and: do you want to sacrifice a clean and simple program and the fun of creating it for a few cycles of CPU time? I wouldn't - especially since 1.9 is so much faster than 1.8 was. My 0.02EUR.

Kind regards

robert

···

On 01/05/2010 12:37 PM, Marnen Laibow-Koser wrote:

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's so much slower than Python...

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Marnen_Laibow-Koser · 5 January 2010 16:32

Robert Klemme wrote:

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...

The question is: does it matter for most practical purposes - and: do
you want to sacrifice a clean and simple program and the fun of creating
it for a few cycles of CPU time?

No. That's why I haven't learned Python yet, although between the speed
increase and GAE, it's sometimes tempting. But I'd really miss the
beautiful design of Ruby.

But my point was a bit different. Python and Ruby are basically similar
languages, and what annoys me is that there seems not to have been the
will in the Ruby community to steal some speed tricks from Python. (I'd
be working on this if I knew anything practical about language
implementation, but I don't.)

I wouldn't - especially since 1.9 is
so much faster than 1.8 was. My 0.02EUR.

Unfortunately, I don't quite trust 1.9 for use with Rails yet...

Kind regards

robert

Best,

···

On 01/05/2010 12:37 PM, Marnen Laibow-Koser wrote:

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Albert_Schlef · 6 January 2010 04:47

Robert Klemme wrote:

···

On 01/05/2010 12:37 PM, Marnen Laibow-Koser wrote:

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...

The question is: does it matter for most practical purposes - and: do
you want to sacrifice a clean and simple program and the fun of creating
it for a few cycles of CPU time? I wouldn't - especially since 1.9 is
so much faster than 1.8 was. My 0.02EUR.

Why does everybody say that CPUs are fast nowadays and that "it dosn't
mattar that language XYZ is slow"?

It does matter: web applications. If your applications can't serve all
the visitors, then you're going to lose your customer or you'll have to
learn some other language with better performance.
--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 5 January 2010 16:35

But my point was a bit different. Python and Ruby are basically similar
languages, and what annoys me is that there seems not to have been the
will in the Ruby community to steal some speed tricks from Python. (I'd
be working on this if I knew anything practical about language
implementation, but I don't.)

Yeah no kidding. Somehow speed just hasn't "felt" like the ruby
community's thing, until 1.9 at least.

I am working on a few projects to make it faster [and I suppose the
macruby, rubinius and jruby guys, are, as well].

Unfortunately, I don't quite trust 1.9 for use with Rails yet...

Come to the dark side...

-r

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Regexp-engine: ruby vs. perl ruby-talk	1	112	6 July 2009
Regex speed ruby-talk	5	63	7 October 2002
Basic Ruby performance ruby-talk	42	167	15 February 2012
Slow regular expressions :( ruby-talk	28	106	28 July 2006
Slow ruby regexes ruby-talk	47	114	18 April 2007

Is ruby's regex slower?

Related Topics