Error in reg ex parser of 1.6.3 and 1.8.1?

I would assume this to be an error:

“<META http-equiv=“Content-Type content=“text/html;
charset=iso-8859-1”>”.gsub(/<(?:[^”>]+|"[^"]*")+>/) { |sMatch| puts sMatch;
’’}

will stop processing of a script - it hangs. I know the HTML text is wrong,
but the text is not from me.

Any idea what I can use as workaround as RegExp so that this will not
happen?

Christian

I would assume this to be an error:

it's called "eternal" match

"<META http-equiv=\"Content-Type content=\"text/html;
charset=iso-8859-1\">".gsub(/<(?:[^">]+|"[^"]*")+>/) { |sMatch| puts sMatch;
''}

Don't give the possibility to backtrace, something like

   gsub(/<(?>[^">]+|"[^"]*")+>/)

probably there is a better way to write it

Guy Decoux

Thank you - I regularily use regular expressions, but never had to deal with
that.

Maybe I need some good literature on that to find out where and when such a
thing could happen.

Christian

···

it’s called “eternal” match

Maybe I need some good literature on that to find out where and when such a
thing could happen.

Well, you have perhaps the best case : what you must see is that the
regexp engine want really, really match, and for this it will try all
combinations possibles.

In some case (regexp with combination of more than one quantifier) it will
take exponential time to try all possibilities and you just need to be
patient :slight_smile:

Guy Decoux