Regular Expression question

All,

Anyone know of anything that exists to create regular expressions from
sample data? Information, code, references, etc.

If nothing exists I would like to create something in Ruby that does just
this. I don’t presume to believe it to be easy, but could be very helpful
if it worked.

Thanks,

John.

All,

Anyone know of anything that exists to create regular expressions from
sample data? Information, code, references, etc.

Do you mean, for example, given
data = [“boat”, “butt”, “brat”, “bait”]

some methods produces:

/b\w\wt/

or

/^b\w+$/

James

All,

Anyone know of anything that exists to create regular expressions from
sample data? Information, code, references, etc.

Do you mean, for example, given
data = [“boat”, “butt”, “brat”, “bait”]

some methods produces:

/b\w\wt/

or

/^b\w+$/

James

···

All,

Anyone know of anything that exists to create regular expressions
from sample data? Information, code, references, etc.

There is a Perl module that generates regexes given a word list.

I’ve seen Java code that generates regexes given a set of numbers.

I doubt that it’s easy to do a good job of dealing with arbitrary
unstructured text.

If nothing exists I would like to create something in Ruby that
does just this. I don’t presume to believe it to be easy, but
could be very helpful if it worked.

Since for a given text there are likely to be a large number of
regexes that match, how would you know which one was best for your
uses?

If there were some defining structure: field delimiters, fixed-width
fields, etc., it would be much easier.

···

On Sunday 28 July 2002 02:54 pm, John wrote:


Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

Ok, information and code :slight_smile:
http://www.toolbox-mag.de/data/makeregex.html

if you want a ‘wizard’
http://txt2regex.sourceforge.net/

's
Guaracy

···

----- Original Message -----
From: “John” nojgoalbyspam@hotmail.com

All,

Anyone know of anything that exists to create regular expressions from
sample data? Information, code, references, etc.

===================
HP: www.guaracy.cjb.net

Yeah. Here is my program that matches a list of words.

def find_regex(str)
/.*/
end

:slight_smile:

I bet is would pass a few test cases. :wink:

Jim

···

On Mon, Jul 29, 2002 at 08:39:56AM +0900, Ned Konz wrote:

Since for a given text there are likely to be a large number of
regexes that match, how would you know which one was best for your
uses?


Jim Freeze
If only I had something clever to say for my comment…
~

Anyone know of anything that exists to create regular expressions
from sample data? Information, code, references, etc.

There is a Perl module that generates regexes given a word list.

I’ve seen Java code that generates regexes given a set of numbers.

I doubt that it’s easy to do a good job of dealing with arbitrary
unstructured text.

If nothing exists I would like to create something in Ruby that
does just this. I don’t presume to believe it to be easy, but
could be very helpful if it worked.

Since for a given text there are likely to be a large number of
regexes that match, how would you know which one was best for your
uses?

If there were some defining structure: field delimiters, fixed-width
fields, etc., it would be much easier.

There’s some code in emacs elisp to do this too. (The info for the function
listed below)

regexp-opt is a compiled Lisp function in `regexp-opt’.
(regexp-opt STRINGS &optional PAREN)

Return a regexp to match a string in STRINGS.
Each string should be unique in STRINGS and should not contain any regexps,
quoted or not. If optional PAREN is non-nil, ensure that the returned regexp
is enclosed by at least one regexp grouping construct.
The returned regexp is typically more efficient than the equivalent regexp:

(let ((open (if PAREN “\(” “”)) (close (if PAREN “\)” “”)))
(concat open (mapconcat 'regexp-quote STRINGS “\|”) close))

If PAREN is `words’, then the resulting regexp is additionally surrounded
by < and >.

···

========

Here’s some other info from another function that might provide a hint for
someone trying to hack one out on his own:

regexp-opt-group is a compiled Lisp function in `regexp-opt’.
(regexp-opt-group STRINGS &optional PAREN LAX)

Return a regexp to match a string in STRINGS.
If PAREN non-nil, output regexp parentheses around returned regexp.
If LAX non-nil, don’t output parentheses if it doesn’t require them.
Merges keywords to avoid backtracking in Emacs’ regexp matcher.

The basic idea is to find the shortest common prefix or suffix, remove it
and recurse. If there is no prefix, we divide the list into two so that
(at least) one half will have at least a one-character common prefix.

Also we delay the addition of grouping parenthesis as long as possible
until we’re sure we need them, and try to remove one-character sequences
so we can use character sets rather than grouping parenthesis.

“Guaracy Monteiro” guaracybm@ig.com.br wrote in message
news:005101c236b3$91b99620$7856cbc8@guara…

···

----- Original Message -----
From: “John” nojgoalbyspam@hotmail.com

All,

Anyone know of anything that exists to create regular expressions from
sample data? Information, code, references, etc.

Ok, information and code :slight_smile:
http://www.toolbox-mag.de/data/makeregex.html

if you want a ‘wizard’
http://txt2regex.sourceforge.net/

's
Guaracy

HP: www.guaracy.cjb.net

Wow Thanks!
John.

…and here is my program that matches James’ list of words:

def find_regex(str)
/boat|butt|brat|bait/
end

Seriously, you need to decide what you want to achieve.

···


Clifford Heath