Matches -> regexp?

Hello,

this is not exactly a ruby problem, i hope people will not consider
this spamming.

does anyone know of any reference or an algorithm to build a regular
expression (a finite automaton, in general) that would match a given
set of strings? i of course mean a regexp that would match just the
given strings, and not any other. so solutions like /^.*$/ are not
acceptable :wink:

thanks
konstantin

ako... schrieb:

Hello,

this is not exactly a ruby problem, i hope people will not consider
this spamming.

does anyone know of any reference or an algorithm to build a regular
expression (a finite automaton, in general) that would match a given
set of strings? i of course mean a regexp that would match just the
given strings, and not any other. so solutions like /^.*$/ are not
acceptable :wink:

thanks
konstantin

nice challenge, reminds me a bit of the 4th rubyquiz :>

ako... schrieb:
>does anyone know of any reference or an algorithm to build a regular
>expression (a finite automaton, in general) that would match a given
>set of strings? i of course mean a regexp that would match just the
>given strings, and not any other. so solutions like /^.*$/ are not
>acceptable :wink:

This is essentially an information compression problem. There's
always a trivial solution:

s = %w{find only these strings}
r = Regexp.new(s.map {|w| '\A' + w + '\Z'}.join('|'))

The _real_ problem is finding the shortest possible regexp. :wink:

well, right, that is what i mean.

i suspect that from your trivial solution we can arrive at the final
solution by just performing a DFA minimization. the same thing that lex
and yacc do when they generate code. so i guess i have answered my own
question. thanks for helping me. your trivial solution just made my
brain work.

konstantin

ako... wrote:

i suspect that from your trivial solution we can arrive at the final
solution by just performing a DFA minimization. the same thing that
lex and yacc do when they generate code. so i guess i have answered
my own question. thanks for helping me. your trivial solution just
made my brain work.

Another option is to build up a tree and create the RX from that. I once
wrote that:

14:12:23 [c]: ( echo foo; echo bar; echo band ) | create-rx.rb
(?:ba(?:nd|r)|foo)

Kind regards

    robert