Regular expressions

Hi –

And at what point would you be
matching something? Could you show a mock-up example of a whole real
case?

Now you’re asking me to invent syntax on the spur of
the moment. I’m not good at that. :slight_smile:

Here’s a very rough first effort. (One too simple really.)

phone = RegexLang.new(<<EOF)
string “(”
digits(3,:area_code,Fixnum)
# Above: Grab three digits, store in area_code
# as a Fixnum
string ") "
match(:rest) do # Store this stuff in ‘rest’ as
digits(3) # a String
string “-”
digits(4)
end
EOF

area_code = rest = nil
str = “(800) 555-1234”
phone.match(str)
puts area_code # 800
puts rest # 555-1234
area_code.is_a? Fixnum # true
puts phone.to_r # /((\d{3}) (\d{3}-\d{4})/

There are lots of problems here. I just tossed it
off the top of my head.

Hmmm… I guess readability is in the eye of the beholder. Give me
/((\d{3})) (\d{3})-(\d{4})/ any day :slight_smile: Or

/

Area code: ‘(’ + 3 digits + ') ’

\((\d{3})\)\

Number: 3 digits + ‘-’ + 4 digits

(\d{3})-(\d{4})

/x

though I find the first one much clearer.

The thing with the local variables is kind of unexpected. Might it be
more idiomatic to have an object with named attributes?

res = phone.match(str)
puts res.area_code # etc.

David

···

On Thu, 17 Apr 2003, Hal E. Fulton wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

“Hal E. Fulton” wrote:

But there is a complexity threshold where I would
rather have a more verbose, readable, multi-line
notation.

FWIW, there’s always the /x option:

phoneregex = /
^ # Beginning of string
\d{3,} # starts with three digits
-* # …followed by an optional dash
\d{3,} # …followed by three more digits
-* # …followed by an optional dash
\d{4,} # …follwed by four digits
$ # end of string
/x

phone1 = “303-555-1212”
phone2 = “3035551212”
phone3 = “abc123xyz”

if phoneregex.match(phone1)
puts “Yes! #{phone1}”
else
puts “No! #{phone1}”
end

if phoneregex.match(phone2)
puts “Yes! #{phone2}”
else
puts “No! #{phone2}”
end

if phoneregex.match(phone3)
puts “Yes! #{phone3}”
else
puts “No! #{phone3}”
end

I realize that it’s not really what you’re talking about, but sometimes
I use this approach to help myself out mentally when tackling a complex
regex. If nothing else, it’s here in case there are folks who don’t
know about extended mode.

Regards,

Dan

···


a = [74, 117, 115, 116, 32, 65, 110, 111, 116, 104, 101, 114, 32, 82]
a.push(117,98, 121, 32, 72, 97, 99, 107, 101, 114)
puts a.pack(“C*”)

Oops.

···

----- Original Message -----
From: “Chris Pine” nemo@hellotree.com

/a*b/

‘a’*(0…) + ‘b’

I didn’t realize you can’t do one-sided ranges in Ruby. (I never seem to
use ranges for some reason.) We could just define some constant for the
star:

RE_STAR
KLEENE
ANY
N # My favorite

Too bad about `0…’, though…

One thing at a time,

:slight_smile:

Chris

Daniel Berger wrote:

“Hal E. Fulton” wrote:

But there is a complexity threshold where I would
rather have a more verbose, readable, multi-line
notation.

FWIW, there’s always the /x option:

phoneregex = /
^ # Beginning of string
\d{3,} # starts with three digits
-* # …followed by an optional dash
\d{3,} # …followed by three more digits
-* # …followed by an optional dash
\d{4,} # …follwed by four digits
$ # end of string
/x

Oops. Should be:

phoneregex = /
^ # Beginning of string
\d{3,3} # starts with three digits
-* # …followed by an optional dash
\d{3,3} # …followed by three more digits
-* # …followed by an optional dash
\d{4,4} # …followed by four digits
$ # end of string
/x

Well, anyway, you get the idea.

Dan

Not to get too ridiculous replying to myself here, but…

Regexps really aren’t just regular expressions anymore; they are
considerably fancier and more powerful, so it’s a bit of a misnomer. (I
think.)

So how about this:

Grammar.new {|g| ‘a’ + (g | ‘’) + ‘b’}

Matches n a’s followed by n b’s for all n… regexps can’t do that. (It’s a
superset of regular expressions.)

Waddaya think, Hal?

:slight_smile:

Chris

Chris Pine wrote:

Oops.

From: “Chris Pine” nemo@hellotree.com

/a*b/

‘a’*(0…) + ‘b’

I didn’t realize you can’t do one-sided ranges in Ruby. (I never seem to
use ranges for some reason.)
[snip]

Too bad about `0…', though…

One thing at a time,

:slight_smile:

Chris

I’m not sure how regexes handle these things internally (and, yes, it
would be nice to have one-sided ranges natively), but for things like
case equality I’ve found it pretty easy to make quick-and-dirty classes
like GreaterThan [e.g. when GreaterThan.new(6)].

Of course, I abandoned this approach for something less ugly… :wink:

Julian

···

----- Original Message -----

Regexps really aren’t just regular expressions anymore; they are
considerably fancier and more powerful, so it’s a bit of a
misnomer. (I think.)

Right. Backreferencing (IIRC) makes RE’s not entirely “regular”.

(It’s a
superset of regular expressions.)

What you described is a context-free (or was that context-sensitive?
been too many years…) grammar.

I think the /x option is about the best way to make a RE readable,
given the constraints we have today.

By the by, do I remember right that you can’t write a RE to match all
sets of matching parens? IIRC, the grammar was:

S → S(S) | e

···

Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo