Slow regular-expression engine

W_James · 31 July 2009 16:30

This Awk code takes less than one hundredth of a second to run:

BEGIN {

  regex = \
    "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" \
    "o?o?o?o?o?o?o?o?o?o?o?o?o?" \
    "ooooooooooooooooooooooooooooo"

print "ooooooooooooooooooooooooooooo" ~ regex

}

This Ruby code takes 27.5 seconds:

t = Time.now

regex = Regexp.new(
  "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" +
  "o?o?o?o?o?o?o?o?o?o?o?o?o?" +
  "ooooooooooooooooooooooooooooo" )

p "ooooooooooooooooooooooooooooo" =~ regex

puts "#{ Time.now - t } seconds"

See http://swtch.com/~rsc/regexp/regexp1.html

Dominik_Honnef1 · 31 July 2009 16:37

w_a_x_man <w_a_x_man@yahoo.com> writes:

This Awk code takes less than one hundredth of a second to run:

BEGIN {

  regex = \
    "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" \
    "o?o?o?o?o?o?o?o?o?o?o?o?o?" \
    "ooooooooooooooooooooooooooooo"

  print "ooooooooooooooooooooooooooooo" ~ regex

}

This Ruby code takes 27.5 seconds:

t = Time.now

regex = Regexp.new(
  "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" +
  "o?o?o?o?o?o?o?o?o?o?o?o?o?" +
  "ooooooooooooooooooooooooooooo" )

p "ooooooooooooooooooooooooooooo" =~ regex

puts "#{ Time.now - t } seconds"

See Regular Expression Matching Can Be Simple And Fast

In Ruby 1.9 its 1/4 of the time (still slow).

But: One might ask if that is a problem at all, considering that this
Regexp looks and is awful and could be written in a way better way which
takes times of 0.000139965 seconds.

t = Time.now
regex = Regexp.new(/o{0,27}ooooooooooooooooooooooooooooo/ )
p "ooooooooooooooooooooooooooooo" =~ regex
puts "#{ Time.now - t } seconds"

···

--
Dominik Honnef
dominikho@gmx.net

W_James · 31 July 2009 16:50

Quoting the article:

  ... it is possible to write so-called "pathological" regular
  expressions that Perl matches very very slowly. In contrast,
  there are no regular expressions that are pathological for
  the Thompson NFA implementation. Seeing the two graphs side
  by side prompts the question, "why doesn't Perl use the
  Thompson NFA approach?" It can, it should ...

···

On Jul 31, 11:37 am, domini...@gmx.net wrote:

w_a_x_man <w_a_x_...@yahoo.com> writes:
> This Awk code takes less than one hundredth of a second to run:

> BEGIN {

> regex = \
> "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" \
> "o?o?o?o?o?o?o?o?o?o?o?o?o?" \
> "ooooooooooooooooooooooooooooo"

> print "ooooooooooooooooooooooooooooo" ~ regex

> }

> This Ruby code takes 27.5 seconds:

> t = Time.now

> regex = Regexp.new(
> "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" +
> "o?o?o?o?o?o?o?o?o?o?o?o?o?" +
> "ooooooooooooooooooooooooooooo" )

> p "ooooooooooooooooooooooooooooo" =~ regex

> puts "#{ Time.now - t } seconds"

> Seehttp://swtch.com/~rsc/regexp/regexp1.html

In Ruby 1.9 its 1/4 of the time (still slow).

But: One might ask if that is a problem at all, considering that this
Regexp looks and is awful and could be written in a way better way which
takes times of 0.000139965 seconds.

t = Time.now
regex = Regexp.new(/o{0,27}ooooooooooooooooooooooooooooo/ )
p "ooooooooooooooooooooooooooooo" =~ regex
puts "#{ Time.now - t } seconds"

--
Dominik Honnef
domini...@gmx.net

Ben_Bleything1 · 31 July 2009 17:01

So what? I'm sure Ruby core would be happy to consider a patch.

Ben

···

On Fri, Jul 31, 2009 at 9:50 AM, w_a_x_man<w_a_x_man@yahoo.com> wrote:

Quoting the article:

... it is possible to write so-called "pathological" regular
expressions that Perl matches very very slowly. In contrast,
there are no regular expressions that are pathological for
the Thompson NFA implementation. Seeing the two graphs side
by side prompts the question, "why doesn't Perl use the
Thompson NFA approach?" It can, it should ...

Robert_K1 · 1 August 2009 12:45

I am not sure as to what exactly your point is. The awk you have been using might have used a DFA implementation (sed does so AFAIK). Nowadays most regular expression engines are NFA because of various reasons (see "Mastering Regular Expressions").

Usually NFA's can be tricked into bad runtime performance with expressions like the one you wrote. I do not consider this a disadvantage because on the other hand you can optimize your expression when working with a NFA, for example by deliberately selecting order of sub expressions in the expression. And, the expression is really pathologic, i.e. you would not use something like that in practice.

Btw, why don't you declare the regular expression directly, i.e.

regex = %r(
   o?o?o?o?o?o?o?o?o?o?o?o?o?o?
   o?o?o?o?o?o?o?o?o?o?o?o?o?
   ooooooooooooooooooooooooooooo
)x

?

Kind regards

robert

···

On 31.07.2009 18:49, w_a_x_man wrote:

On Jul 31, 11:37 am, domini...@gmx.net wrote:

w_a_x_man <w_a_x_...@yahoo.com> writes:

This Awk code takes less than one hundredth of a second to run:
BEGIN {
  regex = \
    "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" \
    "o?o?o?o?o?o?o?o?o?o?o?o?o?" \
    "ooooooooooooooooooooooooooooo"
  print "ooooooooooooooooooooooooooooo" ~ regex
}
This Ruby code takes 27.5 seconds:
t = Time.now
regex = Regexp.new(
  "o?o?o?o?o?o?o?o?o?o?o?o?o?o?" +
  "o?o?o?o?o?o?o?o?o?o?o?o?o?" +
  "ooooooooooooooooooooooooooooo" )
p "ooooooooooooooooooooooooooooo" =~ regex
puts "#{ Time.now - t } seconds"
Seehttp://swtch.com/~rsc/regexp/regexp1.html

In Ruby 1.9 its 1/4 of the time (still slow).

But: One might ask if that is a problem at all, considering that this
Regexp looks and is awful and could be written in a way better way which
takes times of 0.000139965 seconds.

t = Time.now
regex = Regexp.new(/o{0,27}ooooooooooooooooooooooooooooo/ )
p "ooooooooooooooooooooooooooooo" =~ regex
puts "#{ Time.now - t } seconds"

--
Dominik Honnef
domini...@gmx.net

Quoting the article:

  ... it is possible to write so-called "pathological" regular
  expressions that Perl matches very very slowly. In contrast,
  there are no regular expressions that are pathological for
  the Thompson NFA implementation. Seeing the two graphs side
  by side prompts the question, "why doesn't Perl use the
  Thompson NFA approach?" It can, it should ...

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Urabe_Shyouhei1 · 1 August 2009 05:49

Ben Bleything wrote:

···

On Fri, Jul 31, 2009 at 9:50 AM, w_a_x_man<w_a_x_man@yahoo.com> wrote:

Quoting the article:

... it is possible to write so-called "pathological" regular
expressions that Perl matches very very slowly. In contrast,
there are no regular expressions that are pathological for
the Thompson NFA implementation. Seeing the two graphs side
by side prompts the question, "why doesn't Perl use the
Thompson NFA approach?" It can, it should ...

So what? I'm sure Ruby core would be happy to consider a patch.

Yes we do. And if you could write a time-efficient NFA engine with back
references implemented, please also send that to some scientific journal,
because that should solve the P=NP problem.

Forum · 1 August 2009 06:40

You should prepare a mail template as inspired by this historic example

--------------------- 8< -----------------------
Dear reader

we have received your proof of Fermat's Last Theorem. We have to
regret to inform you that there is an error in line ###

--------------------- <8 ----------------------

Cheers
Robert

···

On Sat, Aug 1, 2009 at 7:49 AM, Urabe Shyouhei<shyouhei@ruby-lang.org> wrote:

--
module Kernel
alias_method :λ, :lambda
end

Topic		Replies	Views
Slow regular expressions :( ruby-talk	28	135	28 July 2006
Optimization anyone ruby-talk	5	67	22 November 2005
Slow regular expressions ruby-talk	8	81	29 July 2006
Is ruby's regex slower? ruby-talk	32	171	9 January 2010
Regexp-engine: ruby vs. perl ruby-talk	1	123	6 July 2009

Slow regular-expression engine

Related topics