Building the finite state machine of a Ruby regexp

Does anyone know if there is a nice way to build the Finite State Machine
(FSM) of a Ruby regexp ?

Assuming the native ruby regexp parser used a (more or less explicit) FSM
I first thought that there was some builtin way to access or re-build it
but, after reading some reference docs (Regexp class) and posts I can't
see any useful methods to do this. But I would be happy to be wrong about
this...

As an alternative solution, I wonder if someone knows about a module that
could parse Ruby regexps (accepting exactly the same regexp grammar as
Ruby) and let me read the associated FSM (or whatever graph-like structure
representing the regexp).

Any help will be greatly appreciated. Thanks sincerely.

Does anyone know if there is a nice way to build the Finite State Machine
(FSM) of a Ruby regexp ?

I don't know of anything specifically oriented to Ruby Regexps that
does this, but surely such tools exist for (say) Perl...?

As an alternative solution, I wonder if someone knows about a module that
could parse Ruby regexps (accepting exactly the same regexp grammar as
Ruby) and let me read the associated FSM (or whatever graph-like structure
representing the regexp).

Given how notoriously opaque they are, I've actually found regular
expressions to be pretty easy to parse. I can send you a partial
Regexp parser from one of my projects that just extracted the
information I needed to know at the time.... but probably what you
really want is Simon Strandgaard's regexp-engine. You can find it
here:

http://rubyforge.org/projects/aeditor/

···

On 8/1/06, Gilles Lacroix <lacroix7@free.fr> wrote:

Does anyone know if there is a nice way to build the Finite State Machine
(FSM) of a Ruby regexp ?

I don't know of anything specifically oriented to Ruby Regexps that
does this, but surely such tools exist for (say) Perl...?

Strictly speaking, there isn't anything that does this, since modern 'extended' regexes (a set which includes Ruby's regexes) aren't always representable as finite state automata (i.e., they can recognise more than the class of regular languages).

As an alternative solution, I wonder if someone knows about a module that
could parse Ruby regexps (accepting exactly the same regexp grammar as
Ruby) and let me read the associated FSM (or whatever graph-like structure
representing the regexp).

Sorry, can't help you there either - I don't know and couldn't dig up anything that exposes Ruby's regex internals (or Oniguruma's) in the way you're asking for, which strikes me as the only way to use *exactly* the same grammar as Ruby does. Like Caleb Clausen suggested, though, there are probably partial solutions out there, depending on how flexible your criteria are and exactly what information you need - are you looking for an aid in building regexes? just visualising them? some other sort of analysis?

matthew smillie.

···

On Aug 2, 2006, at 22:44, Caleb Clausen wrote:

On 8/1/06, Gilles Lacroix <lacroix7@free.fr> wrote:

What I'd like to do is, given any regexp :
1. To test *quickly* if a particular string matches this regexp : that's
why I'd prefer to use native Ruby regexes. They are the natural choice
when programming in Ruby and offer very good performance (because the
parser is compiled into the Ruby interpreter from a well-established C
code base (GNU regexps) that should be reasonably well optimized after
decades of public exposure).
2. To generate "random" strings that match the same regexp : that's why I
was thinking about using a Finite State Machine (or kind of it since, as
you pointed out 'modern' regexp are not alway representable as FSM) :
starting from the initial state and choosing randomly an outbound edge,
I can add a (first) character to the string. Repeating the process from
the node I arrived on, I can add a second character, and so on... (then
I will also have to make sure that I finally reach the final state in a
reasonable time but that's another problem).

Gilles Lacroix.

···

On Thu, 03 Aug 2006 09:22:14 +0900, Matthew Smillie wrote :

(...) Like Caleb Clausen
suggested, though, there are probably partial solutions out there,
depending on how flexible your criteria are and exactly what
information you need - are you looking for an aid in building
regexes? just visualising them? some other sort of analysis?