In python the regexes allow you to call a function instead of just substitute the values (see <http://docs.python.org/lib/node111.html> for more details). That seems quite useful, is there something similar in ruby?
Also, let's say I want match anything between "a" and "b" unless it contains the word "foo". I could write two regexes like so:
if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/
Is there a good way to make that kind of logic into one regex? Is there some kind of "intersect" operator or a "not" operator?
Regards,
Jeff Davis
Jeff Davis wrote:
In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html>
for
more details). That seems quite useful, is there something similar in
ruby?
Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:
if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/
Is there a good way to make that kind of logic into one regex? Is
there
some kind of "intersect" operator or a "not" operator?
You can't write executable code within the regex, but...
When constructing the regex you can include calls that will be
evaluated, just like double-quote string:
start_tag = 'a'
end_tag = 'b'
r = /#{start_tag}(.*?)#{end_tag}/ #=> /a(.*?)b/
Also, if you call sub/gsub you can pass a block. E.g.
s1 = 'xxx a x foo x b xxx'
r = /a(.*?)b/
puts s1.sub(r) { |match| match =~ /foo/ ? '' : match } #=> 'xxx xxx'
HTH,
Assaph
In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html> for
more details). That seems quite useful, is there something similar in ruby?
Yes, #sub and #gsub can be passed a block to be evaluated for
the replacement value.
Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:
if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/
Careful, consider this string:
str = 'afoolbalmnbaopbafoob'
which has two sets of a..b that don't contain 'foo', but your
pairing rejects it because a..foo..b can also be found.
Is there a good way to make that kind of logic into one regex? Is there
some kind of "intersect" operator or a "not" operator?
A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:
re = %r{a((?:(?!foo).)*?)b}
str = 'afoolbalmnbaopbafoob'
str.scan(re).each do |m|
p m
end
__END__
["lmn"]
["op"]
regards,
andrew
···
On Thu, 27 Jan 2005 12:21:35 +0900, Jeff Davis <jdavis-list@empires.org> wrote:
--
Andrew L. Johnson http://www.siaris.net/
But puzzles in programming are what make it challenging and fun
sometimes... you always end up learning one more way not to do
something each time. -- Brad Fenwick
Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!
Jacob Fugal
···
On Thu, 27 Jan 2005 16:10:59 +0900, Andrew Johnson <ajohnson@cpan.org> wrote:
A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:
re = %r{a((?:(?!foo).)*?)b}
Jacob Fugal wrote:
A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:
re = %r{a((?:(?!foo).)*?)b}
Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!
Jacob Fugal
Another way is kind of complicated, but it works. Let's say that you want to match a string like:
if str =~ /a(.*)b/ and str !~ /a(.*xyz.*)b/
then you can instead do:
if str =~ /[^a]*a([^bx]|x[^by]|xy[^bz])*(b|xb|xyb)/
[ I changed to 'xyz' from 'foo' to show what's going on in the regex better ]
It's nice to have one regex like that, but you can see that it gets complicated and hard to read, especially as the string you're avoiding (in this case xyz) turns into a complicated regex.
Technically, you can build any regular expression with only "()", "|" and "*" (and of course concatenation, which is just two expressions next to eachother, no operator is needed). Andrew's is much more readable, however.
Regards,
Jeff Davis
Note: I know I answered my own question. I did a little research about regexes first. Thanks Andrew for the negative-lookahead thing, that's what I was looking for.
···
On Thu, 27 Jan 2005 16:10:59 +0900, Andrew Johnson <ajohnson@cpan.org> wrote: