Regex questions

In python the regexes allow you to call a function instead of just substitute the values (see <http://docs.python.org/lib/node111.html> for more details). That seems quite useful, is there something similar in ruby?

Also, let's say I want match anything between "a" and "b" unless it contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is there some kind of "intersect" operator or a "not" operator?

Regards,
    Jeff Davis

Jeff Davis wrote:

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html&gt;

for

more details). That seems quite useful, is there something similar in

ruby?

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is

there

some kind of "intersect" operator or a "not" operator?

You can't write executable code within the regex, but...

When constructing the regex you can include calls that will be
evaluated, just like double-quote string:

start_tag = 'a'
end_tag = 'b'
r = /#{start_tag}(.*?)#{end_tag}/ #=> /a(.*?)b/

Also, if you call sub/gsub you can pass a block. E.g.

s1 = 'xxx a x foo x b xxx'
r = /a(.*?)b/
puts s1.sub(r) { |match| match =~ /foo/ ? '' : match } #=> 'xxx xxx'
HTH,
Assaph

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html&gt; for
more details). That seems quite useful, is there something similar in ruby?

Yes, #sub and #gsub can be passed a block to be evaluated for
the replacement value.

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Careful, consider this string:

  str = 'afoolbalmnbaopbafoob'

which has two sets of a..b that don't contain 'foo', but your
pairing rejects it because a..foo..b can also be found.

Is there a good way to make that kind of logic into one regex? Is there
some kind of "intersect" operator or a "not" operator?

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

  re = %r{a((?:(?!foo).)*?)b}
  str = 'afoolbalmnbaopbafoob'
  str.scan(re).each do |m|
    p m
  end
  __END__
  ["lmn"]
  ["op"]

regards,
andrew

···

On Thu, 27 Jan 2005 12:21:35 +0900, Jeff Davis <jdavis-list@empires.org> wrote:

--
Andrew L. Johnson http://www.siaris.net/
      But puzzles in programming are what make it challenging and fun
      sometimes... you always end up learning one more way not to do
      something each time. -- Brad Fenwick

Hi --

···

On Thu, 27 Jan 2005, Assaph Mehr wrote:

Jeff Davis wrote:

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html&gt;

for

more details). That seems quite useful, is there something similar in

ruby?

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is

there

some kind of "intersect" operator or a "not" operator?

You can't write executable code within the regex, but...

I don't think it will help Jeff's case, but in general you certainly
can include code in a regex if you want to:

   irb(main):001:0> puts "match" if /abc#{gets.chomp}/.match("abcdef")
   def
   match

David

--
David A. Black
dblack@wobblini.net

Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!

Jacob Fugal

···

On Thu, 27 Jan 2005 16:10:59 +0900, Andrew Johnson <ajohnson@cpan.org> wrote:

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

  re = %r{a((?:(?!foo).)*?)b}

Jacob Fugal wrote:

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

re = %r{a((?:(?!foo).)*?)b}
   
Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!

Jacob Fugal

Another way is kind of complicated, but it works. Let's say that you want to match a string like:
if str =~ /a(.*)b/ and str !~ /a(.*xyz.*)b/

then you can instead do:
if str =~ /[^a]*a([^bx]|x[^by]|xy[^bz])*(b|xb|xyb)/

[ I changed to 'xyz' from 'foo' to show what's going on in the regex better ]

It's nice to have one regex like that, but you can see that it gets complicated and hard to read, especially as the string you're avoiding (in this case xyz) turns into a complicated regex.

Technically, you can build any regular expression with only "()", "|" and "*" (and of course concatenation, which is just two expressions next to eachother, no operator is needed). Andrew's is much more readable, however.

Regards,
    Jeff Davis

Note: I know I answered my own question. I did a little research about regexes first. Thanks Andrew for the negative-lookahead thing, that's what I was looking for.

···

On Thu, 27 Jan 2005 16:10:59 +0900, Andrew Johnson <ajohnson@cpan.org> wrote: