Grep and regular expressions in ruby

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

is there a more elegant solution?

The second question concerns regexps: how do you indicate that you don’t
want to match the specified pattern? For example, if I only wanted strings
containing “cat” but not containing “small”. Once again, I think the perl
would look like this:
ar2 = grep(( /cat/ and !/small/ ), ar );

but ‘!’ doesn’t work for me in ruby. So far I haven’t found anything about
this in Pickaxe or on the web.

thanks,
Krishna

ar.select {|e| e =~ /cat|small/ }

ar.select {|e| e =~ /cat/ or e != /small/ }

rubyway?

Krishna Dole wrote:

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

ar.select {|e| e =~ /cat|small/ }

···

is there a more elegant solution?

The second question concerns regexps: how do you indicate that you don’t
want to match the specified pattern? For example, if I only wanted strings
containing “cat” but not containing “small”. Once again, I think the perl
would look like this:
ar2 = grep(( /cat/ and !/small/ ), ar );

but ‘!’ doesn’t work for me in ruby. So far I haven’t found anything about
this in Pickaxe or on the web.

thanks,
Krishna

Rodrigo Bermejo | rodrigo.bermejo@ps.ge.com
IT-Specialist | 8*879-0644

Krishna Dole wrote:

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

is there a more elegant solution?

Have you had a look at lib/eregexp? It doesn’t quite do what you need,
but a couple of quick additions makes it work:

require ‘eregex’
class RegOr
def ===(other)
self =~ other
end
end
class RegAnd
def ===(other)
self =~ other
end
end

ar = [ “cat”, “dog”, “catdog”, “dogcat” ].grep(/cat/&/dog/)

Cheers

Dave

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

To make this run I had to add some $ signs in front of your variable names
:slight_smile:

The reason this works in Perl is because you are doing two separate regexp
comparisons. The syntax

   /foo/

is shorthand for

   $_ =~ /foo/

and ‘grep’ assigns $_ for each iteration, making it like a block parameter
in Ruby. So the direct Ruby translation is

   ar2 = ar.select(|e| e =~ /cat/ and e =~ /small/)

In Perl you drop the ‘|e| e=~’ because it’s implicit on $_

There are many Ruby methods which try to simulate this behaviour on $_ by
being methods of Kernel, for example Kernel#chomp! is essentially $_.chomp!,
and because Kernel is included in Object, you can just write ‘chomp!’

But this doesn’t work for a standalone regular expressions. In Ruby you
definitely don’t want /foo/ to mean $_ =~ /foo/, because

myregexp = /foo/           # build a regexp object

is useful and definitely not the same as

myregexp = $_ =~ /foo/     # build and match a regexp object

Personally I think Ruby borrows too much from Perl already in this way; I
would happily see $_ stricken from the language completely, and all the
polluted methods in Kernel removed.

In other words, I’d deprecate

while gets
chomp!
gsub!(/foo/,‘bar’)
# etc
end

in favour of:

while line = gets
line.chomp!
line.gsub!(/foo/,‘bar’)
# etc
end

Somebody once did propose, and I think even implement, a default object
receiver: something like

while line = gets
with line do
chomp!
gsub!(/foo/,‘bar’)
# etc
end
end

For the duration of the block, self is set to ‘line’. That IMO is much
more Rubyish than having a bunch of methods all act on the shared global
variable $_

Anyway, I digress :slight_smile:

ar2 = grep(( /cat/ and !/small/ ), ar );

but ‘!’ doesn’t work for me in ruby.

There is
e !~ /foo/
which is shorthand for
!(e =~ /foo/)

Regards,

Brian.

···

On Thu, Apr 24, 2003 at 08:00:55AM +0900, Krishna Dole wrote:

Hi,

···

In message “grep and regular expressions in ruby” on 03/04/24, “Krishna Dole” kpd@krishnadole.com writes:

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

is there a more elegant solution?

ar2 = ar.select{|x| /cat/ =~ x and /small/ =~ x}

						matz.

ar.select {|e| e =~ /cat|small/ }

ar.select {|e| e =~ /cat/ or e != /small/ }

rubyway?

The latter is not an RE match:

“cat” != /z/
?>
(i.e. Ruby will give you an error).

I guess you can write:

ar.select {|e| e =~ /cat/ or not e =~ /small/ }

I don’t know if this is the Ruby-way or not, but I do think that Ruby’s
RE’s and grep are lacking. There should be a way to express negation in
a regular expression, and there should be a way to express several
conditions in grep. I guess that the complaint over grep really reduces
to a complaint over REs.

IMHO one should be able to type some sort of equivalent to

“/cat/ and not /small/”.

Just like we have “/cat|small/”.

I have found myself wishing for something like that more than once.

I can see that the exact syntax I wrote can’t be made to work, but we can
probably find an alternate RE-like syntax. Perhaps something like:

/cat|small/ → true if matches /cat/ or matches /small/
/cat&small/ → true if matches /cat/ and matches /small/
/cat|!small/ → true if matches /cat/ or does not match /small/
/cat&!small/ → true if matches /cat/ and does not match /small/

Just my $0.02

···

On Thu, Apr 24, 2003 at 08:17:37AM +0900, Bermejo, Rodrigo wrote:

Krishna Dole wrote:

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

ar.select {|e| e =~ /cat|small/ }

is there a more elegant solution?

The second question concerns regexps: how do you indicate that you don’t
want to match the specified pattern? For example, if I only wanted strings
containing “cat” but not containing “small”. Once again, I think the perl
would look like this:
ar2 = grep(( /cat/ and !/small/ ), ar );

but ‘!’ doesn’t work for me in ruby. So far I haven’t found anything about
this in Pickaxe or on the web.

thanks,
Krishna

Rodrigo Bermejo | rodrigo.bermejo@ps.ge.com
IT-Specialist | 8*879-0644


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

Somebody once did propose, and I think even implement, a default object
receiver: something like

while line = gets
with line do
chomp!
gsub!(/foo/,‘bar’)
# etc
end
end

For the duration of the block, self is set to ‘line’. That IMO is much
more Rubyish than having a bunch of methods all act on the shared global
variable $_

def with(object, &block)
object.instance_eval(&block)
end
=> nil
with “a” do
?> print size
end
1=> nil

···

On Thu, Apr 24, 2003 at 04:44:50PM +0900, Brian Candler wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Yes I have a Machintosh, please don’t scream at me.
– Larry Blumette on linux-kernel

Thanks Bermejo and Daniel.
It would be nice to have negation in REs and a more flexible grep, but I
will get by fine using select.

I’ve been meaning to purchase The Ruby Way… maybe tonight.

Thanks again,
Krishna

···

-----Original Message-----
From: Daniel Carrera [mailto:dcarrera@math.umd.edu]
Sent: Wednesday, April 23, 2003 4:28 PM
To: ruby-talk ML
Subject: Re: grep and regular expressions in ruby

On Thu, Apr 24, 2003 at 08:17:37AM +0900, Bermejo, Rodrigo wrote:

ar.select {|e| e =~ /cat|small/ }

ar.select {|e| e =~ /cat/ or e != /small/ }

rubyway?

The latter is not an RE match:

“cat” != /z/
?>
(i.e. Ruby will give you an error).

I guess you can write:

ar.select {|e| e =~ /cat/ or not e =~ /small/ }

I don’t know if this is the Ruby-way or not, but I do think that Ruby’s
RE’s and grep are lacking. There should be a way to express negation in
a regular expression, and there should be a way to express several
conditions in grep. I guess that the complaint over grep really reduces
to a complaint over REs.

IMHO one should be able to type some sort of equivalent to

“/cat/ and not /small/”.

Just like we have “/cat|small/”.

I have found myself wishing for something like that more than once.

I can see that the exact syntax I wrote can’t be made to work, but we can
probably find an alternate RE-like syntax. Perhaps something like:

/cat|small/ → true if matches /cat/ or matches /small/
/cat&small/ → true if matches /cat/ and matches /small/
/cat|!small/ → true if matches /cat/ or does not match /small/
/cat&!small/ → true if matches /cat/ and does not match /small/

Just my $0.02

Krishna Dole wrote:

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

ar.select {|e| e =~ /cat|small/ }

is there a more elegant solution?

The second question concerns regexps: how do you indicate that you don’t
want to match the specified pattern? For example, if I only wanted
strings
containing “cat” but not containing “small”. Once again, I think the perl
would look like this:
ar2 = grep(( /cat/ and !/small/ ), ar );

but ‘!’ doesn’t work for me in ruby. So far I haven’t found anything
about
this in Pickaxe or on the web.

thanks,
Krishna

Rodrigo Bermejo | rodrigo.bermejo@ps.ge.com
IT-Specialist | 8*879-0644


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

From: Daniel Carrera [mailto:dcarrera@math.umd.edu]
Sent: Wednesday, April 23, 2003 7:28 PM
To: ruby-talk ML
Subject: Re: grep and regular expressions in ruby

I don’t know if this is the Ruby-way or not, but I do think that Ruby’s
RE’s and grep are lacking. There should be a way to express negation in
a regular expression, and there should be a way to express several
conditions in grep. I guess that the complaint over grep really reduces
to a complaint over REs.

IMHO one should be able to type some sort of equivalent to

“/cat/ and not /small/”.

Just like we have “/cat|small/”.

I have found myself wishing for something like that more than once.

I can see that the exact syntax I wrote can’t be made to work, but we can
probably find an alternate RE-like syntax. Perhaps something like:

/cat|small/ → true if matches /cat/ or matches /small/
/cat&small/ → true if matches /cat/ and matches /small/
/cat|!small/ → true if matches /cat/ or does not match /small/
/cat&!small/ → true if matches /cat/ and does not match /small/

Just my $0.02

Does the “!~” operator help you out here?

“cat” =~ /cat/ => 0
“cat” !~ /cat/ => false

···

-----Original Message-----

> /cat|small/ -> true if matches /cat/ or matches /small/ > /cat&small/ -> true if matches /cat/ and matches /small/ > /cat|!small/ -> true if matches /cat/ or does not match /small/ > /cat&!small/ -> true if matches /cat/ and does not match /small/

may have a hole, but it think it can be done:

----CUT----
#!/usr/bin/env ruby

small = ‘small’
notsmall = ‘[1]$|[2](?:s(?!mall)[^s]*)+$’
cat = ‘cat’

patterns = {
:cat_or_small =>
%r/(?:#{cat})|(?:#{small})/,
:cat_and_small =>
%r/(?:(?:#{cat}.#{small})|(?:#{small}.#{cat}))+/,
:cat_or_notsmall =>
%r/(?:#{cat})|(?:#{notsmall})/,
:cat_and_notsmall =>
%r/(?:(?:#{cat}.#{notsmall})|(?:#{notsmall}.#{cat}))+/,
}

words = [
‘cat’,
‘small’,
‘smallcat’,
‘catsmall’,
‘smalcat’,
‘smalcat small’,
‘catsmal’,
‘catsmal small’,
‘smal’,
‘smalsmall’,
‘smal small’,
‘smallsmal’,
‘smalcatsmall’,
‘smalcat small’,
‘smallcatsmal’,
]

patterns.each do |name, pattern|
puts “#{name}(#{pattern.source})\n========”
words.each do |word|
matched = word =~ pattern ? ‘true’ : ‘false’
printf “\s%-16.16s matched? %-8.8s\n”, word, matched
end
end

def notword word
a, rest = word[0…0], word[1…-1]
%r/[3]$|[4](?:#{a}(?!#{rest})[^#{a}]*)+$/
end
----CUT----

the finite state machine for ‘not small’ is easy to draw. it really helped
make the RE.

   .------------------------------------
  / \          |       |       |       |
 /   ^s      ^[ms]   ^[as]   ^[ls]   ^[ls]
/     \        |       |       |       |
[start ]---s---*---m---*---a---*---l---*---l---[fail]
              /|       |       |       |
             s |       s       s       s
            /  |       |       |       |
            ---|-----------------------|

start -> accept
*     -> accept

-a

···

On Thu, 24 Apr 2003, Daniel Carrera wrote:

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================


  1. ^s ↩︎

  2. ^s ↩︎

  3. ^#{a} ↩︎

  4. ^#{a} ↩︎

I widely recommend you (if you are a newbie) the 'Sam’s Teach Yourself
Ruby in 21 days ’ book…
a good and enjoyable reading …
I read it in just 18 days(…may I forgot some chapters ?) …and it’s
still in my desktop

Take a look at:

http://www.bookpool.com/.x/mprew9z9y1/ss/1?qs=ruby

They use to have good prices.

-r.
Novody is perfect.
------------------>

Krishna Dole wrote:

···

Thanks Bermejo and Daniel.
It would be nice to have negation in REs and a more flexible grep, but I
will get by fine using select.

I’ve been meaning to purchase The Ruby Way… maybe tonight.

Thanks again,
Krishna

-----Original Message-----
From: Daniel Carrera [mailto:dcarrera@math.umd.edu]
Sent: Wednesday, April 23, 2003 4:28 PM
To: ruby-talk ML
Subject: Re: grep and regular expressions in ruby

On Thu, Apr 24, 2003 at 08:17:37AM +0900, Bermejo, Rodrigo wrote:

ar.select {|e| e =~ /cat|small/ }

ar.select {|e| e =~ /cat/ or e != /small/ }

rubyway?

The latter is not an RE match:

“cat” != /z/

?>
(i.e. Ruby will give you an error).

I guess you can write:

ar.select {|e| e =~ /cat/ or not e =~ /small/ }

I don’t know if this is the Ruby-way or not, but I do think that Ruby’s
RE’s and grep are lacking. There should be a way to express negation in
a regular expression, and there should be a way to express several
conditions in grep. I guess that the complaint over grep really reduces
to a complaint over REs.

IMHO one should be able to type some sort of equivalent to

“/cat/ and not /small/”.

Just like we have “/cat|small/”.

I have found myself wishing for something like that more than once.

I can see that the exact syntax I wrote can’t be made to work, but we can
probably find an alternate RE-like syntax. Perhaps something like:

/cat|small/ → true if matches /cat/ or matches /small/
/cat&small/ → true if matches /cat/ and matches /small/
/cat|!small/ → true if matches /cat/ or does not match /small/
/cat&!small/ → true if matches /cat/ and does not match /small/

Just my $0.02

Krishna Dole wrote:

I’m quite taken with ruby, but recently I ran into trouble using grep. I
have two questions: first, is there a way to call grep with multiple
regexps? For example, if you had
ar = [“cat”, “dog”, “smallcat”]

and you only wanted strings matching both “small” and “cat”, in perl I
think you could do something like
ar2 = grep(( /cat/ and /small/ ), ar );

but in ruby it seems like you have to call grep twice
ar2 = ar.grep(/cat/).grep(/small/)

ar.select {|e| e =~ /cat|small/ }

is there a more elegant solution?

The second question concerns regexps: how do you indicate that you don’t
want to match the specified pattern? For example, if I only wanted

strings

containing “cat” but not containing “small”. Once again, I think the perl
would look like this:
ar2 = grep(( /cat/ and !/small/ ), ar );

but ‘!’ doesn’t work for me in ruby. So far I haven’t found anything

about

this in Pickaxe or on the web.

thanks,
Krishna


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

Rodrigo Bermejo | rodrigo.bermejo@ps.ge.com
IT-Specialist | 8*879-0644

/cat|small/ → true if matches /cat/ or matches /small/
/cat&small/ → true if matches /cat/ and matches /small/
/cat|!small/ → true if matches /cat/ or does not match /small/
/cat&!small/ → true if matches /cat/ and does not match /small/

Just my $0.02

Does the “!~” operator help you out here?

“cat” =~ /cat/ => 0
“cat” !~ /cat/ => false

Well:

  1. That only solves a small subset of the problems I’m referring to.
    For instance, it still doesn’t let me grep for items that contain the word
    “cat” and not the word “small”

  2. It seems to act odd with my irb:

“cat” !~ /z/
?>
?>

i.e. Ruby doesn’t seem to realize that the statement is done.

···


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

Ara,

notsmall = ‘[1]$|[2](?:s(?!mall)[^s]*)+$’

This caught my eye as being a little redundant.  To illustrate, if we

let X = ‘[^s]’ and Y = '(?:s(?!mall)[^s])’, this can be rewritten as:

    notsmall = "^#{X}$|^#{X}#{Y}+$"

Which makes it a little easier to see that this is the same as:

    notsmall = "^#{X}#{Y}*$"

Undoing the substitutions:

    notsmall = '^[^s]*(?:s(?!mall)[^s]*)*$'

However, making a distinction between the 's' and the 'mall' still

bothered me. At first I tried simply:

    notsmall = '^(?!small)*$'

Unfortunately, this only matches a zero length string since the (?!re)

syntax does not consume any characters. So, I added a ‘.’ to consume the
current character:

    notsmall = '^(?:(?!small).)*$'

I believe this behaves the same as your original regular expression, but

(IMHO) is much clearer. However, I have no idea what the performance
implications are.

I hope someone finds this interesting...

- Warren Brown

  1. ^s ↩︎

  2. ^s ↩︎

Ara,

may have a hole, but it think it can be done:

Yup, this has a hole in cat_and_notsmall:

cat_and_notsmall((?:(?:cat.[1]$|[2](?:s(?!mall)[^s])+$)|(?:[3]$|^
[^s]
(?:s(?!mall)[^s]*)+$.*cat))+)

···

========
cat matched? true
small matched? false
smallcat matched? false
catsmall matched? false
smalcat matched? true
smalcat small matched? false
catsmal matched? true
catsmal small matched? false
smal matched? true <----
smalsmall matched? false
smal small matched? false
smallsmal matched? false
smalcatsmall matched? false
smalcat small matched? false
smallcatsmal matched? false

If you take a close look at the regular expression, you will see it

breaks down into:

[01] (?:
[02] (?:
[03] cat.[4]$
[04] |
[05] [5](?:s(?!mall)[^s])+$
[06] )
[07] |
[08] (?:
[09] [6]$
[10] |
[11] [7]
(?:s(?!mall)[^s]*)+$.*cat
[12] )
[13] )+

Lines [03] and [11] can never match anything (because of the 'cat.*^'

and ‘$.*cat’). The only reason this regular expression matches anything is
due to [05] and [09]. Since [05] is a more restrictive superset of [09],
this regular expression basically reduces to [05] (i.e. notsmall).

However, this is not just a case of bad parenthesis, the problem is with

the whole ‘(something).(not_something_else)’ approach. This will never
work because the '.
’ portion is always able to consume a something_else
before the ‘(not_something_else)’ portion ever sees it.

The only way I know of to do this is with a

‘^(not_something_else)something(not_something_else)$’ approach.

I hope this didn't end up sounding like a critique.  It's just that I

couldn’t see how this was working until I pulled it apart, and when I did I
found what I thought was interesting stuff.

- Warren Brown

  1. ^s ↩︎

  2. ^s ↩︎

  3. ^s ↩︎

  4. ^s ↩︎

  5. ^s ↩︎

  6. ^s ↩︎

  7. ^s ↩︎

it it interesting - i’ve filed it in my ‘interesting’ folder :wink:

however, it doesn’t work for the examples my program was attempting to solve?

if you notice the program has the pattern


:cat_and_notsmall =>
%r/(?:(?:#{cat}.#{notsmall})|(?:#{notsmall}.#{cat}))+/,

for some reason, your pattern doesn’t seem to work if nested that way. in
fact, now that i look at it i’m not sure how mine does either. try plugging
yours in the program and see what i mean - ideas on this?

in any case - i think it IS clear that matching ‘not word’ is not only
possible, it might even be easier than i made it out to be :wink:

-a

···

On Fri, 25 Apr 2003, Warren Brown wrote:

Ara,

notsmall = ‘[1]$|[2](?:s(?!mall)[^s]*)+$’

This caught my eye as being a little redundant.  To illustrate, if we

let X = ‘[^s]’ and Y = '(?:s(?!mall)[^s])’, this can be rewritten as:

    notsmall = "^#{X}$|^#{X}#{Y}+$"

Which makes it a little easier to see that this is the same as:

    notsmall = "^#{X}#{Y}*$"

Undoing the substitutions:

    notsmall = '^[^s]*(?:s(?!mall)[^s]*)*$'

However, making a distinction between the 's' and the 'mall' still

bothered me. At first I tried simply:

    notsmall = '^(?!small)*$'

Unfortunately, this only matches a zero length string since the (?!re)

syntax does not consume any characters. So, I added a ‘.’ to consume the
current character:

    notsmall = '^(?:(?!small).)*$'

I believe this behaves the same as your original regular expression, but

(IMHO) is much clearer. However, I have no idea what the performance
implications are.

I hope someone finds this interesting...

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================


  1. ^s ↩︎

  2. ^s ↩︎