the GNU regexp engine has a few oddities..
here is another example that triggers an endless loop.
r='<META http-equiv="Content-Type content="text/html; charset=iso-8859-1">'
r.scan /<(?:[^">]+|"[^"]*")+>/
To clarify, it's probably not an endless loop, just may or
may not finish in our lifetimes. 
If you make the string shorter, like:
r='<M h="C-T c="t/h; c=i-8-1">'
...you'll see it finishes quickly.
Add a few characters:
r='<M h="C-T c="t/h; charset=i-8-1">'
...and there's a slight delay before it finishes.
I found that removing greediness from your outer one-or-more
match, sped it up a lot: (changed + to +?)
r.scan /<(?:[^">]+|"[^"]*")+?>/
Now the match on your full string finishes in a few seconds
on my system. Still slow... just a lot faster than the
greedy version.
Incidentally, it's the mismatched quotes in the attribute
value that are causing the backtracking.
If we allow the regex to fail "gracefully" on mismatched
quotes, we can prevent the backtracking:
r.scan /<(?:[^">]+|"[^"]*"|")+?>/
...i.e. the thinking is, if all else fails, just gobble a
single " and keep going.
Regards,
Bill
···
From: "Simon Strandgaard" <neoneye@gmail.com>