Hello,
I have a regex infinte loop kind of problem. I use ruby 1.8.2. The regular expression I used was:
[tT]he\s+(([\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*((\s*(\,|and|or)\s*)*[\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*)*))\s+((\w+\s+){0,3}\s*(proteins|genes|protein|gene))
I try to match this regex against the string, without the quotes (the following should be a whole single line):
"to this end , NK_CTL clones derived from four donors ( KK , GG , GF , and DP ) were tested for their ability to lyse the TAP2_deficient RMA_S\HLA_E cell_line incubated with serial_dilutions of the VMAPRTLIL , VMAPRTLVL , VMAPRTLLL , and VMAPRALLL peptides ."
Normally the regex is not supposed to match against this particular string. What it does, it hangs with 100% CPU consumption, while for lots of other lines of text it works ok. Am I doing something wrong?
I tried doing the same regex in perl, it complained about the string having the \H unknown control sequence (it appears indeed in the text at some point). If I replace the "\H" by, let's say, "-H", the regular expression passes through without finding anything in Perl -- which is normal, as I said. Ruby hangs even if I change the "\" in "-".
Needless to say I would much appreciate some help on this one. Feel free to ask for explanation of that complicated regex if needed or any other information.
As attachment, the ruby script that hangs.
Best regards,
Adrian Dimulescu.
regex-hangs.rb (517 Bytes)