Has anyone ever see a call to String.scan (the "regular" form, not the
block form) just hang when processing a string?
If so, any insight into how that might happen?
Thanks,
Wes
···
--
Posted via http://www.ruby-forum.com/.
Has anyone ever see a call to String.scan (the "regular" form, not the
block form) just hang when processing a string?
If so, any insight into how that might happen?
Thanks,
Wes
--
Posted via http://www.ruby-forum.com/.
Wes Gamble wrote:
Has anyone ever see a call to String.scan (the "regular" form, not the block form) just hang when processing a string?
If so, any insight into how that might happen?
Thanks,
Wes
Got an example you can share that demonstrates the problem?
Maybe the string is huge? Maybe there's a recursive lookup happening of some kind? Dunno without more detail.
Regards,
Dan
Wes Gamble wrote:
Has anyone ever see a call to String.scan (the "regular" form, not the block form) just hang when processing a string?
If so, any insight into how that might happen?
Thanks,
WesGot an example you can share that demonstrates the problem?
Maybe the string is huge? Maybe there's a recursive lookup happening of some kind? Dunno without more detail.
I'm guessing exponential backtracking when the expression doesn't match.
For ex:
x = "x" * 1000
x << "ab"
x =~ /x*x*a/ # matches quickly
x =~ /x*x*b/ # "hangs" trying exponential combinations with backtracking
Regards,
Bill
From: "Daniel Berger" <djberg96@gmail.com>
Yes, that's a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it's caused by having "x*" twice in your RX - not very
useful anyway. If you are interested in more detail I recommend
reading "Mastering Regular Expressions":
http://www.amazon.com/gp/product/0596002890/
Kind regards
robert
2006/5/24, Bill Kelly <billk@cts.com>:
From: "Daniel Berger" <djberg96@gmail.com>
> Wes Gamble wrote:
>> Has anyone ever see a call to String.scan (the "regular" form, not the
>> block form) just hang when processing a string?
>>
>> If so, any insight into how that might happen?
>>
>> Thanks,
>> Wes
>>
>
> Got an example you can share that demonstrates the problem?
>
> Maybe the string is huge? Maybe there's a recursive lookup happening of
> some kind? Dunno without more detail.I'm guessing exponential backtracking when the expression doesn't match.
For ex:
x = "x" * 1000
x << "ab"
x =~ /x*x*a/ # matches quickly
x =~ /x*x*b/ # "hangs" trying exponential combinations with backtracking
--
Have a look: http://www.flickr.com/photos/fussel-foto/
>
> I'm guessing exponential backtracking when the expression doesn't match.
>
> For ex:
>
> x = "x" * 1000
> x << "ab"
> x =~ /x*x*a/ # matches quickly
> x =~ /x*x*b/ # "hangs" trying exponential combinations with backtrackingYes, that's a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it's caused by having "x*" twice in your RX - not very
useful anyway.
Thanks We haven't seen the OP's regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.
Regards,
Bill
From: "Robert Klemme" <shortcutter@googlemail.com>
2006/5/24, Bill Kelly <billk@cts.com>:
Bill Kelly wrote:
From: "Robert Klemme" <shortcutter@googlemail.com>
Yes, that's a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it's caused by having "x*" twice in your RX - not very
useful anyway.Thanks
We haven't seen the OP's regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.Regards,
Bill
All,
Thanks for giving me some idea of what might be wrong. For now, I've
gone with a simpler regex that I think will work fine. My original
pattern was attempting to allow several portions of the pattern to be
optional and the more that I think about it, the more it makes sense
that this might occur.
I will keep (?>pattern) in mind as a way to inhibit backtracking.
Thanks,
Wes
--
Posted via http://www.ruby-forum.com/\.
Just for kicks, here was the original regex - attempting to match
against a string containing an entire HTML page OR a style sheet.
/(?=<style)*?background.*?(?=-image)*?:.*?url\((.*?)\).*?(?=<\/style>)*?/
I was attempting to make the <style and </style> parts optional.
As it turns out, there are so many different ways that a style can be
specified in a HTML page, that I just went with the much simpler:
/background.*?:.*?url\((.*?)\)/i
Wes
--
Posted via http://www.ruby-forum.com/.