Thoughts on improving usage of Regexp#match

Please feel free to point out obvious things
that I’m not seeing here.

I was looking at a piece of code like this:

sub1 = ""
if str =~ regex
sub1 = $1
end

and I thought three things:

  1. I’ve never liked the $1 – I usually avoid
    Perlish variables, even the common ones.
  2. I don’t like the if.
  3. I don’t like having to assign sub1 twice.

So I tried writing:

sub1 = regex.match(str)[1] || “”

In theory this should be the same as the other
four lines of code.

But it doesn’t work because if the match
fails, #match returns nil and I can’t call
[] on nil.

So I thought: Why don’t I make a special
MatchData object that will just return nil
for any [] call?

<side_note>
Did you know that MatchData doesn’t have a
’new’ method?? I had to do a silly hack to
create a MatchData object before adding a
singleton method to it.
</side_note>

Here’s my code:

class Regexp
alias :oldmatch :match

def match(str)
  result = self.oldmatch(str)
  if not result
    null = /x/.match("x") # No new
    def null.[](index); nil; end
    result = null
  end
  result
end

end

Thoughts?

Hal

Please feel free to point out obvious things
that I’m not seeing here.

I was looking at a piece of code like this:

sub1 = “”
if str =~ regex
sub1 = $1
end

Hee hee.

and I thought three things:

  1. I’ve never liked the $1 – I usually avoid
    Perlish variables, even the common ones.
  2. I don’t like the if.
  3. I don’t like having to assign sub1 twice.

So I tried writing:

sub1 = regex.match(str)[1] || “”

a = /x/.match(“abc”) && $~[1] || “” # a => “”

b = /(b)/.match(“abc”) && $~[1] || “” # b => “b”

c = /b/.match(“abc”) && $~[1] || “” # c => “”

I think that #2 in your list above is a sign of Perl envy where Perl has
no notion of nil vs empty string (which I brought up on this list a
while back) to which Timmy Hammerquist said “even Larry Wall says you
should test the success or failure of a regex!”

Having to pre-initialize the variable with an empty string does seem
kinda lame, but it’s The Ruby Way, isn’t it?

– Dossy

···

On 2002.09.20, Hal E. Fulton hal9000@hypermetrics.com wrote:


Dossy Shiobara mail: dossy@panoptic.com
Panoptic Computer Network web: http://www.panoptic.com/
“He realized the fastest way to change is to laugh at your own
folly – then you can let go and quickly move on.” (p. 70)

Hi,

···

In message “Thoughts on improving usage of Regexp#match” on 02/09/20, “Hal E. Fulton” hal9000@hypermetrics.com writes:

I was looking at a piece of code like this:

sub1 = “”
if str =~ regex
sub1 = $1
end

and I thought three things:

  1. I’ve never liked the $1 – I usually avoid
    Perlish variables, even the common ones.
  2. I don’t like the if.
  3. I don’t like having to assign sub1 twice.

So I tried writing:

sub1 = regex.match(str)[1] || “”

How about

sub1 = (regex.match(str)[1] rescue “”)

if you’re using 1.7.x?

						matz.

Hi –

Please feel free to point out obvious things
that I’m not seeing here.

I was looking at a piece of code like this:

sub1 = “”
if str =~ regex
sub1 = $1
end

and I thought three things:

  1. I’ve never liked the $1 – I usually avoid
    Perlish variables, even the common ones.
  2. I don’t like the if.
  3. I don’t like having to assign sub1 twice.

So I tried writing:

sub1 = regex.match(str)[1] || “”

In theory this should be the same as the other
four lines of code.

But it doesn’t work because if the match
fails, #match returns nil and I can’t call
on nil.

You could do:

sub1 = regex.match(str).to_a[1].to_s

So I thought: Why don’t I make a special
MatchData object that will just return nil
for any call?
[…]
Thoughts?

Would you have to test the [0] element to see whether the match had
succeeded?

David

···

On Fri, 20 Sep 2002, Hal E. Fulton wrote:


David Alan Black | Register for RubyConf 2002!
home: dblack@candle.superlink.net | November 1-3
work: blackdav@shu.edu | Seattle, WA, USA
Web: http://pirate.shu.edu/~blackdav | http://www.rubyconf.com

This will break the following code:
foo = “this is a test”
r = /^this will not match$/
if not r.match(foo) then
puts “did not match”
end

Paul

···

On Fri, Sep 20, 2002 at 09:47:34AM +0900, Hal E. Fulton wrote:

class Regexp
alias :oldmatch :match

def match(str)
  result = self.oldmatch(str)
  if not result
    null = /x/.match("x") # No new
    def null.[](index); nil; end
    result = null
  end
  result
end

end

Please feel free to point out obvious things
that I’m not seeing here.

I was looking at a piece of code like this:

sub1 = “”
if str =~ regex
sub1 = $1
end

and I thought three things:

  1. I’ve never liked the $1 – I usually avoid
    Perlish variables, even the common ones.
  2. I don’t like the if.
  3. I don’t like having to assign sub1 twice.

So I tried writing:

sub1 = regex.match(str)[1] || “”

In theory this should be the same as the other
four lines of code.

But it doesn’t work because if the match
fails, #match returns nil and I can’t call
on nil.

(code with special MatchData object skipped)

Hal,

have you thought of introducing a new or enhancing the current
Regex#match method with two additional optional parameters (the
index and a default value)? Usage like

sub1 = regex.new_match(str, 1, “”)

(didn’t find a good name)

Regards,
Pit

···

On 20 Sep 2002, at 9:47, Hal E. Fulton wrote:

Hi,

···

In message “Re: Thoughts on improving usage of Regexp#match” on 02/09/20, Dossy dossy@panoptic.com writes:

a = /x/.match(“abc”) && $~[1] || “” # a => “”

b = /(b)/.match(“abc”) && $~[1] || “” # b => “b”

c = /b/.match(“abc”) && $~[1] || “” # c => “”

I guess Hal dislikes these for ugly “$”. Another solution:

regex.match(str)
sub1 = Regexp.last_match(1) || “”

						matz.

I like that very much. I don’t usually use
1.7.x, though, except with FreeRIDE.

Hal

···

----- Original Message -----
From: “Yukihiro Matsumoto” matz@ruby-lang.org
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Thursday, September 19, 2002 8:02 PM
Subject: Re: Thoughts on improving usage of Regexp#match

So I tried writing:

sub1 = regex.match(str)[1] || “”

How about

sub1 = (regex.match(str)[1] rescue “”)

if you’re using 1.7.x?

Yes. Overall a bad idea on my part.

Hal

···

----- Original Message -----
From: “Paul Brannan” pbrannan@atdesk.com
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Thursday, September 19, 2002 8:18 PM
Subject: Re: Thoughts on improving usage of Regexp#match

This will break the following code:
foo = “this is a test”
r = /^this will not match$/
if not r.match(foo) then
puts “did not match”
end

What, Dossy, you think it looks familiar? :slight_smile:
I’ll mail you the current version soon.

Hal

···

----- Original Message -----
From: “Dossy” dossy@panoptic.com
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Thursday, September 19, 2002 7:56 PM
Subject: Re: Thoughts on improving usage of Regexp#match

I was looking at a piece of code like this:

sub1 = “”
if str =~ regex
sub1 = $1
end

Hee hee.

Na, it’s lisp-envy. Nil and false should be indistinguishable from an empty
list.

– Nikodemus

···

On Fri, 20 Sep 2002, Dossy wrote:

I think that #2 in your list above is a sign of Perl envy where Perl has
no notion of nil vs empty string (which I brought up on this list a

Hi,

···

At Fri, 20 Sep 2002 17:01:09 +0900, Pit Capitain wrote:

have you thought of introducing a new or enhancing the current
Regex#match method with two additional optional parameters (the
index and a default value)? Usage like

sub1 = regex.new_match(str, 1, “”)

sub1 = str[regex, 1] || “” # 1.7 feature


Nobu Nakada

Hi,

a = /x/.match(“abc”) && $~[1] || “” # a => “”

b = /(b)/.match(“abc”) && $~[1] || “” # b => “b”

c = /b/.match(“abc”) && $~[1] || “” # c => “”

I guess Hal dislikes these for ugly “$”.

True. And also because they have three operands
rather than just two, which crosses my complexity
threshold for this kind of situation. :slight_smile:

Another solution:

regex.match(str)
sub1 = Regexp.last_match(1) || “”

Also good! This will work with 1.6, and
it meets all three of my silly criteria.

Thanks,
Hal

···

----- Original Message -----
From: “Yukihiro Matsumoto” matz@ruby-lang.org
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Thursday, September 19, 2002 8:07 PM
Subject: Re: Thoughts on improving usage of Regexp#match

In message “Re: Thoughts on improving usage of Regexp#match” > on 02/09/20, Dossy dossy@panoptic.com writes:

I think that #2 in your list above is a sign of Perl envy where Perl has
no notion of nil vs empty string (which I brought up on this list a

Na, it’s lisp-envy. Nil and false should be indistinguishable from an empty
list.

To extend on this & the related auto-vivification issues: seems to me that
functionality of this kind is one of the few things you lose in OO, when
all types – including user defined ones – stand on equal footing.

If would be nice if nil was indistinguishable from an empty container, but
the general case would lead to madness, I suspect.

Auto-vivification of nil to a non-empty container via a signature-method
(to_ary, etc) would be quite doable, but symmetry would also require that
empty containers would automagically become nil’s as well.
(Auto-nullification?)

I don’t see it happening, since it is a huge change in the overall look and
feel of the language, and would propably lead to chaos. :wink: I would love to
be proved wrong, though!

– Nikodemus

Nikodemus Siivola wrote:

I think that #2 in your list above is a sign of Perl envy where Perl has
no notion of nil vs empty string (which I brought up on this list a

Na, it’s lisp-envy. Nil and false should be indistinguishable from an empty
list.

To extend on this & the related auto-vivification issues: seems to me that
functionality of this kind is one of the few things you lose in OO, when
all types – including user defined ones – stand on equal footing.

That’s a wonderful thing about Ruby. Arguably, Perl is slightly better
than Ruby for string processing, and Lisp edges out Ruby for list
processing. But Ruby is much better than Perl for working with lists or
any complex data structures, and Ruby is easier to use than Lisp for
string processing.

Which isn’t to say I don’t miss some things from lisp, like macros.