Regex questions

Jeff_Davis · 27 January 2005 03:21

In python the regexes allow you to call a function instead of just substitute the values (see <http://docs.python.org/lib/node111.html> for more details). That seems quite useful, is there something similar in ruby?

Also, let's say I want match anything between "a" and "b" unless it contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is there some kind of "intersect" operator or a "not" operator?

Regards,
Jeff Davis

Assaph_Mehr1 · 27 January 2005 05:10

Jeff Davis wrote:

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html>

for

more details). That seems quite useful, is there something similar in

ruby?

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is

there

some kind of "intersect" operator or a "not" operator?

You can't write executable code within the regex, but...

When constructing the regex you can include calls that will be
evaluated, just like double-quote string:

start_tag = 'a'
end_tag = 'b'
r = /#{start_tag}(.*?)#{end_tag}/ #=> /a(.*?)b/

Also, if you call sub/gsub you can pass a block. E.g.

s1 = 'xxx a x foo x b xxx'
r = /a(.*?)b/
puts s1.sub(r) { |match| match =~ /foo/ ? '' : match } #=> 'xxx xxx'
HTH,
Assaph

Andrew_Johnson · 27 January 2005 07:10

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html> for
more details). That seems quite useful, is there something similar in ruby?

Yes, #sub and #gsub can be passed a block to be evaluated for
the replacement value.

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Careful, consider this string:

str = 'afoolbalmnbaopbafoob'

which has two sets of a..b that don't contain 'foo', but your
pairing rejects it because a..foo..b can also be found.

Is there a good way to make that kind of logic into one regex? Is there
some kind of "intersect" operator or a "not" operator?

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

  re = %r{a((?:(?!foo).)*?)b}
  str = 'afoolbalmnbaopbafoob'
  str.scan(re).each do |m|
    p m
  end
  __END__
  ["lmn"]
  ["op"]

regards,
andrew

···

On Thu, 27 Jan 2005 12:21:35 +0900, Jeff Davis <jdavis-list@empires.org> wrote:

--
Andrew L. Johnson http://www.siaris.net/
      But puzzles in programming are what make it challenging and fun
      sometimes... you always end up learning one more way not to do
      something each time. -- Brad Fenwick

David_A_Black3 · 27 January 2005 11:27

Hi --

···

On Thu, 27 Jan 2005, Assaph Mehr wrote:

Jeff Davis wrote:

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html>

for

more details). That seems quite useful, is there something similar in

ruby?

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is

there

some kind of "intersect" operator or a "not" operator?

You can't write executable code within the regex, but...

I don't think it will help Jeff's case, but in general you certainly
can include code in a regex if you want to:

   irb(main):001:0> puts "match" if /abc#{gets.chomp}/.match("abcdef")
   def
   match

David

--
David A. Black
dblack@wobblini.net

Jacob_Fugal · 27 January 2005 16:47

Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!

Jacob Fugal

···

On Thu, 27 Jan 2005 16:10:59 +0900, Andrew Johnson <ajohnson@cpan.org> wrote:

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

re = %r{a((?:(?!foo).)*?)b}

Jeff_Davis · 27 January 2005 18:57

Jacob Fugal wrote:

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

re = %r{a((?:(?!foo).)*?)b}

Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!

Jacob Fugal

Another way is kind of complicated, but it works. Let's say that you want to match a string like:
if str =~ /a(.*)b/ and str !~ /a(.*xyz.*)b/

then you can instead do:
if str =~ /[^a]*a([^bx]|x[^by]|xy[^bz])*(b|xb|xyb)/

[ I changed to 'xyz' from 'foo' to show what's going on in the regex better ]

It's nice to have one regex like that, but you can see that it gets complicated and hard to read, especially as the string you're avoiding (in this case xyz) turns into a complicated regex.

Technically, you can build any regular expression with only "()", "|" and "*" (and of course concatenation, which is just two expressions next to eachother, no operator is needed). Andrew's is much more readable, however.

Regards,
Jeff Davis

Note: I know I answered my own question. I did a little research about regexes first. Thanks Andrew for the negative-lookahead thing, that's what I was looking for.

···

On Thu, 27 Jan 2005 16:10:59 +0900, Andrew Johnson <ajohnson@cpan.org> wrote:

Topic		Replies	Views
Regex questions ruby-talk	0	70	27 January 2005
Regex questions ruby-talk	0	73	27 January 2005
RegEx stuff ruby-talk	3	100	16 July 2008
Regex help, anyone? ruby-talk	2	86	1 September 2008
How do I call a regex from C? ruby-talk	10	148	14 April 2003

Regex questions

Related topics