Regular expression bug, or feature?

I’ve got a strange feeling that this is one of those little bits of
knowledge known to everybody except people who think they know
something about regular expressions knows… but here goes:

I have a regular expression and a string which cause Ruby to hang.
Try this:

src=’<hosted class=“com.firsthop.mg.gui.Servlet” product=“TestSuite"
classpath’
src =~ /^<((?:[\w:][-\w\d.]:)?[\w:][-\w\d.])\s*(((([”’]).?\5)|[^/’">])*?)(/)?>/mu

Notice that there’s a CR between %q{“TestSuite”} and %q{classpath}.

This hangs on at least Ruby 1.6.5 and 1.6.7 – one of those 100% CPU,
never return hangs.

Is this a known problem with Ruby, or should I file a bug report?

I’ve checked this in Perl; it doesn’t match, of course, but neither
does it hang.

“Sean Russell” ser@germane-software.com wrote in message
news:83173408.0209171521.7e752f4f@posting.google.com

Notice that there’s a CR between %q{“TestSuite”} and %q{classpath}.

on ruby 1.7.2 (2002-07-02) [i386-mswin32] it hung even ** without ** the CR
and the /mu
like so:

src = ‘<hosted class=“com.firsthop.mg.gui.Servlet” product=“TestSuite”
classpath’
src =~
/^<((?:[\w:][-\w\d.]:)?[\w:][-\w\d.])\s*((((["']).?\5)|[^/'">])*?)(/)?

/

Weird !

I’ve checked this in Perl; it doesn’t match, of course, but neither
does it hang.

Sorry, I did not register this until after my previous post … so that it
doesn’t match is expected but that it hangs is not !

Right, Sean ?

– shanko

I’ve got a strange feeling that this is one of those little bits of
knowledge known to everybody except people who think they know
something about regular expressions knows… but here goes:

I have a regular expression and a string which cause Ruby to hang.
Try this:

src=‘<hosted class=“com.firsthop.mg.gui.Servlet” product=“TestSuite”
classpath’
src =~ /^<((?:[\w:][-\w\d.]:)?[\w:][-\w\d.])\s*((((["']).?\5)|[^/'">])*?)(/)?>/mu

Notice that there’s a CR between %q{“TestSuite”} and %q{classpath}.

This hangs on at least Ruby 1.6.5 and 1.6.7 – one of those 100% CPU,
never return hangs.

Try: src=%Q{<h c=“c” p=“T”\nc}

Ruby should process it almost instantly…

Then try: src=%Q{<h c=“c” p=“T”\nccccccccccccccccc}

It may take several seconds…

Append a few more characters and it’ll take a loooooooonng time :wink:

Is this a known problem with Ruby, or should I file a bug report?

I’ve checked this in Perl; it doesn’t match, of course, but neither
does it hang.

I think your nested (…)'s are causing exponential backtracking
trying to find a match. (Or in the case of the non-greedy one, “forward-
tracking” :slight_smile:

Perl, I’ve read, incorporates a number of optimizations to try to
recognize and shortcut patterns like that… But one can outwit its
shortcuts presumably pretty easily.

According to Pickaxe, Ruby supports the (?>) extension… It’s useful
in construction expressions where you want to prevent backtracking from
occurring when parts of the expression don’t match.

HTH,

Bill

This hangs on at least Ruby 1.6.5 and 1.6.7 -- one of those 100% CPU,
never return hangs.

not really "hang" but it does take a while to run. mine eventually
came back (sorry I didn't time it, but my guess is 30 minutes):

irb(main):025:0> src='<hosted class="com.firsthop.mg.gui.Servlet" product="TestSuite"
irb(main):026:0' classpath'
"<hosted class=\"com.firsthop.mg.gui.Servlet\" product=\"TestSuite\"\nclasspath"
irb(main):027:0> src =~ /^<((?:[\w:][-\w\d.]*:)?[\w:][-\w\d.]*)\s*((((["']).*?\5)|[^\/'">]*)*?)(\/)?>/mu
nil
irb(main):028:0>

-joe

But it worked with the following change:
src = '<hosted class=“com.firsthop.mg.gui.Servlet” product=“TestSuite”

^

notice the last ‘>’

src =~
/^<((?:[\w:][-\w\d.]:)?[\w:][-\w\d.])\s*((((["']).?\5)|[^/'">])*?)(/)?

···

/mu