Regexp and string halts ruby

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

···

--
Posted via http://www.ruby-forum.com/.

it's just a bad regex.

   harp:~ > cat a.rb
   #! /usr/bin/env ruby

   re = /^((?:\s*[^,;=]+[;=][^,;=]+,?)+)$/

   s = " pid=934, time=478611, command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
   4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q
   AW02,AW02S TMQFORWARD -C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
   /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02, "

   m = re.match s

   puts m.to_a.first

   harp:~ > time ruby a.rb
    pid=934, time=478611, command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
   4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q
   AW02,AW02S TMQFORWARD -C dom=ccaApp -g 650 -i 4278 -u pthp68 -U

   real 0m0.008s
   user 0m0.010s
   sys 0m0.000s

google 'exponential backtracking'

regards.

-a

···

On Fri, 26 Jan 2007, Ricardo Ramalho wrote:

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
- the dalai lama

Are you *sure* that it enters an infinite loop? I see you nest two star operators and a ".+" inside a + operator - this calls for trouble (i.e. exponential run time).

I don't know what you are actually trying to achieve with your regexp but you should consider changing it. What strikes me odd:

- you have a group that covers the whole RX, this is superfluous as you can get the same more easily

- You are grouping but not extracting, maybe you can show more code and more of the text you match against (if this is just a small part).

Kind regards

  robert

···

On 25.01.2007 19:20, Ricardo Ramalho wrote:

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Ricardo Ramalho schrieb:

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

There is no infinite loop - it it takes a long time, because your regular expression is possibly too complex. These "(...*...+...*...)+" constructs are dangerous. See "Mastering Regular Expressions - Friedl" for details.

An example with your regular expression and your string, starting with a short part an taken longer ones each step. The whole string may take some hundred years on my small machine :wink:

>>>>> Code >>>>>

regex = /^((?:\s*.+[;=][^,]*,?)+)$/

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68, "
t = Time.new
string.match(regex)
puts "Match need appr. #{Time.now-t} seconds"
STDOUT.flush

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG, "
t = Time.new
string.match(regex)
puts "Match need appr. #{Time.now-t} seconds"
STDOUT.flush

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD, "
t = Time.new
string.match(regex)
puts "Match need appr. #{Time.now-t} seconds"

>>>>> Output >>>>>

Match need appr. 7.0 seconds
Match need appr. 21.38 seconds
Match need appr. 60.658 seconds

>>>>> EoE >>>>>

Wolfgang Nádasi-Donner

Ricardo Ramalho <ricardo-g-ramalho@telecom.pt> writes:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

It is possible to write a regular expression that takes exponential
time to confirm that there's no match. This is not really anything
too new. You can do similar things to Python and to Perl too,
although with Perl you have to try significantly harder since 5.6.
(There's some code in there that detects certain exponential-time
matching patterns and adjusts behavior)

You can simplify your example to:

  /(?:.+=[^,]*,?)+$/ =~ "x=x,x" * 12

Which will complete on my machine, albeit very slowly. Increasing the
number much more will get you what appears to be a hang.

Note that if the regular expression you gave were to match the string,
then this regular expression should also match:

/(?:\s*.+[;=][^,]*,?)$/

Consider this a "prerequisite" to your match - if this doesn't match,
then there's no way your full expression could match your input. In
your code, one thing you could do is check against this short
expression first, and only if that matches then check against the long
expression.

However, it may be much more productive simply to rewrite your
expression as:

/^((?:\s*.+[;=][^,]*,)*(?:\s*.+[;=][^,]*,?))$/

I've expanded out your (?:blah)+ into a (?:blah)*(?:blah), and (most
importantly!) I've removed the "?" after the "," in the first clause,
on the assumption that the comma is optional only on the last piece in
the line. With that change, ruby no longer goes nuts trying to match
this regular expression.

···

--
s=%q( Daniel Martin -- martin@snowplow.org
       puts "s=%q(#{s})",s.map{|i|i}[1] )
       puts "s=%q(#{s})",s.map{|i|i}[1]

Perhaps you should have done it this way:

s = " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S
TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

re = /[^\s;=]+[;=][^;=,\s]+/

p s.scan( re )

--- output -----
["pid=934", "time=478611", "command=TMQFORWARD",
"args=-C", "dom=ccaApp", "dom=ccaApp"]

···

On Jan 25, 12:20 pm, Ricardo Ramalho <ricardo-g-rama...@telecom.pt> wrote:

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

--
Posted viahttp://www.ruby-forum.com/.

Thanks guys, i guess that was it, the regexp was too complex. Daniel
Martin's simplification (?:blah,?)+ into a (?:blah,)*(?:blah,?) really
did the trick! What (mis)led me to believe it was a bug was the fact
that i tried the original regexp/string in perl, and it returned rather
quickly. I didn't know that, also according to Martin, perl has code in
that detects certain exponential-time
matching patterns and adjusts behavior.

I think i'll go and get Friedl's Mastering Regular Expressions - i'm
curious and i like O'Reilly's books anyway... :slight_smile:

Best regards,
Ricardo Ramalho

···

--
Posted via http://www.ruby-forum.com/.

You wont be disappointed. It is a tome of black magic. :wink:

James Edward Gray II

···

On Jan 26, 2007, at 3:58 AM, Ricardo Ramalho wrote:

I think i'll go and get Friedl's Mastering Regular Expressions - i'm
curious and i like O'Reilly's books anyway... :slight_smile: