The problem with using $1 in regexps

Philip_Mak2 · 10 August 2002 01:07

sub num_quotes {
$[0] =~ /^(>+)/;
return length($1);
}

See the problem in the above subroutine? (It’s Perl, but the $1
problem applies to Ruby too.) The subroutine is meant to return the
number of ‘>’ marks at the beginning of a line. However, I
(mistakenly) wrote a regular expression that will fail to match if
there are no ‘>’ marks. If it fails to match, I end up returning the
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

How would a good programmer avoid this situation? In Ruby, I prefer
the =~ and $1 regexp syntax (rather than the OO one) because it’s shorter
to write…

Tom_Gilbert · 10 August 2002 01:10

Well I’d do:

return length($1) if $_[0] =~ /^(>+)/;
return 0; # or whatever a failed result should be.

I don’t know what a “good programmer” would do

Tom.

···

Philip Mak (pmak@animeglobe.com) wrote:

sub num_quotes {
$[0] =~ /^(>+)/;
return length($1);
}

See the problem in the above subroutine? (It’s Perl, but the $1
problem applies to Ruby too.) The subroutine is meant to return the
number of ‘>’ marks at the beginning of a line. However, I
(mistakenly) wrote a regular expression that will fail to match if
there are no ‘>’ marks. If it fails to match, I end up returning the
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

How would a good programmer avoid this situation? In Ruby, I prefer
the =~ and $1 regexp syntax (rather than the OO one) because it’s shorter
to write…

–
.^. .-------------------------------------------------------.
/V\ | Tom Gilbert, London, England | http://linuxbrit.co.uk |
/( )\ | Open Source/UNIX consultant | tom@linuxbrit.co.uk |
^^-^^ `-------------------------------------------------------’

michael_libby1 · 10 August 2002 02:05

sub num_quotes {
$[0] =~ /^(>+)/;
return length($1);
}
[snip]
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

The advantage to the OO syntax is that a non-match would raise an error
when you tried to call MatchData#size, because you would be trying to send
‘size’ to nil. Which ain’t happenin’.

One comment on the Perl, it will return 0 if no argument is passed at all.
Probably a situation where an error should be raised-- if you have -w on,
at least it will complain about the uninitialized variable in the regex.

How would a good programmer avoid this situation? In Ruby, I prefer
the =~ and $1 regexp syntax (rather than the OO one) because it’s
shorter to write…

The following Ruby code using standad OO is certainly shorter (but only
slightly more readable) than your Perl code for the same function (even if
you remove the “unnecessary” return and just end with length($1); ) and
would have helped you catch the bug in your regex right away. As an added
bonus, this sort of code is thread-safe.

def _num_quotes(x)
/^(>*)/.match(x)[1].size
end

Perl’s (and Ruby’s) $digits are globals. Good programmers avoid globals (or
at least that’s what they tell me ).

Of course, you can rewrite the Perl to be shorter and get rid of the $1 at
the same time (though, I have no idea if it is using the globals behind
the scenes):

sub num_quotes {
length(( $[0] =~ /^(>+)/ )[0]);
}

(note: the ruby code is still shorter – i love ruby!)

-michael

···

On Friday 09 August 2002 20:07, Philip Mak wrote:

++++++++++++++++++++++++++++++++++++++++++
Michael C. Libby x@ichimunki.com
public key: http://www.ichimunki.com/public_key.txt
web site: http://www.ichimunki.com
++++++++++++++++++++++++++++++++++++++++++

Nobuyoshi_Nakada · 10 August 2002 13:23

Hi,

sub num_quotes {
$[0] =~ /^(>+)/;
return length($1);
}

If you were really made a subroutine, you must not meet the

problem.

See the problem in the above subroutine? (It’s Perl, but the $1
problem applies to Ruby too.) The subroutine is meant to return the
number of ‘>’ marks at the beginning of a line. However, I
(mistakenly) wrote a regular expression that will fail to match if
there are no ‘>’ marks. If it fails to match, I end up returning the
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

Recent versions clear $1 and so on when match failed.

$ cat a.rb
NUM_QUOTES = /^(>+)/
NUM_QUOTES =~ “>>>a”
p $1
NUM_QUOTES =~ “a>>>”
p $1
$ ruby-1.6 -v a.rb
ruby 1.6.7 (2002-08-01) [i686-linux]
“>>>”
nil

···

At Sat, 10 Aug 2002 10:07:07 +0900, Philip Mak wrote:

–
Nobu Nakada

michael_libby1 · 10 August 2002 02:12

For the sake of correctness you would not be calling MatchData#size, but
rather MatchData# to get at the numbered match. The resulting string is
then the receiver for size.

-michael

···

On Friday 09 August 2002 21:05, michael libby wrote:

The advantage to the OO syntax is that a non-match would raise an error
when you tried to call MatchData#size, because you would be trying to
send ‘size’ to nil. Which ain’t happenin’.

++++++++++++++++++++++++++++++++++++++++++
Michael C. Libby x@ichimunki.com
public key: http://www.ichimunki.com/public_key.txt
web site: http://www.ichimunki.com
++++++++++++++++++++++++++++++++++++++++++

Yukihiro_Matsumoto2 · 10 August 2002 15:48

Hi,

···

In message “Re: The problem with using $1 in regexps” on 02/08/10, michael libby x@ichimunki.com writes:

Perl’s (and Ruby’s) $digits are globals. Good programmers avoid globals (or
at least that’s what they tell me ).

Ruby’s $digits are not globals, despite of their appearance. They
are (and $~ too) local to the current scope. Good programmers avoid
using $digits probably because they are ugly.

						matz.

Gavin_Sinclair · 12 August 2002 00:51

Good programmers use $1 etc when porting Perl to Ruby. They’re not that
ugly. But they are ugly enough to switch to more pure-Ruby once the program
works.

Gavin

···

----- Original Message -----
From: “Yukihiro Matsumoto” matz@ruby-lang.org
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Sunday, August 11, 2002 1:48 AM
Subject: Re: The problem with using $1 in regexps

Hi,

In message “Re: The problem with using $1 in regexps” > on 02/08/10, michael libby x@ichimunki.com writes:

Perl’s (and Ruby’s) $digits are globals. Good programmers avoid globals (or
at least that’s what they tell me ).

Ruby’s $digits are not globals, despite of their appearance. They
are (and $~ too) local to the current scope. Good programmers avoid
using $digits probably because they are ugly.

matz.

Topic		Replies	Views
Thoughts on improving usage of Regexp#match ruby-talk	14	188	20 September 2002
Consistency (was: Re: Python vs Ruby!) ruby-talk	1	154	19 August 2005
Why is $1 in a grep() equal to nil? ruby-talk	15	224	2 March 2011
A regular expression problem ruby-talk	6	90	5 March 2007
New Ruby questions ruby-talk	13	98	14 April 2004

The problem with using $1 in regexps

If you were really made a subroutine, you must not meet the

problem.

Related topics