While this is true in theory there is a number of modern RX features that are extremely difficult or impossible to do with a DFA. So as it stands now for pratical purposes most modern RX engines are NFA or hybrids, sometimes (Perl) with heavy optimizations to prevent certain ugly effects (see "Mastering Regular Expressions" for details).
Kind regards
robert
···
On 11.04.2007 22:51, MenTaLguY wrote:
On Thu, 12 Apr 2007 03:27:04 +0900, "Robert Dober" <robert.dober@gmail.com> wrote:
No it must be me who is confused, I thought that backtracking cannot
be avoided in regexs like e.g. %r{[a]*.*a} when trying for a greedy
matchusing a Ragel-ish notation, the NFA for /^a*.*a/ should look like this (after collapsing most of the epsilon transitions introduced by Thompson's Algorithm):
start: ('' ->a_star | '' ->any_star),
a_star: ('a' ->a_star | any ->any_star),
any_star: (any ->any_star | 'a' =>final),
final: (any ->final)Here, -> is a regular transition, and => is a labeled transition which sets a counter (initialized to -1) to the current input position if that is larger than the counter's current value. When we have explored all possible matches, the value of the counter will be the position of the last character in the (greedy) match.
The DFA of course could never backtrack but is there a DFA for this regex?
Every NFA can be rewritten as a DFA. Each state in the resulting DFA corresponds to a superposition of states in the original NFA (again in pseudo-Ragel, where {a,b,c} denotes the superposition of NFA states a, b, and c):