The problem with using $1 in regexps

sub num_quotes {
$
[0] =~ /^(>+)/;
return length($1);
}

See the problem in the above subroutine? (It’s Perl, but the $1
problem applies to Ruby too.) The subroutine is meant to return the
number of ‘>’ marks at the beginning of a line. However, I
(mistakenly) wrote a regular expression that will fail to match if
there are no ‘>’ marks. If it fails to match, I end up returning the
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

How would a good programmer avoid this situation? In Ruby, I prefer
the =~ and $1 regexp syntax (rather than the OO one) because it’s shorter
to write…

Well I’d do:

return length($1) if $_[0] =~ /^(>+)/;
return 0; # or whatever a failed result should be.

I don’t know what a “good programmer” would do :slight_smile:

Tom.

···

sub num_quotes {
$
[0] =~ /^(>+)/;
return length($1);
}

See the problem in the above subroutine? (It’s Perl, but the $1
problem applies to Ruby too.) The subroutine is meant to return the
number of ‘>’ marks at the beginning of a line. However, I
(mistakenly) wrote a regular expression that will fail to match if
there are no ‘>’ marks. If it fails to match, I end up returning the
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

How would a good programmer avoid this situation? In Ruby, I prefer
the =~ and $1 regexp syntax (rather than the OO one) because it’s shorter
to write…


.^. .-------------------------------------------------------.
/V\ | Tom Gilbert, London, England | http://linuxbrit.co.uk |
/( )\ | Open Source/UNIX consultant | tom@linuxbrit.co.uk |
^^-^^ `-------------------------------------------------------’

sub num_quotes {
$
[0] =~ /^(>+)/;
return length($1);
}
[snip]
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

The advantage to the OO syntax is that a non-match would raise an error
when you tried to call MatchData#size, because you would be trying to send
‘size’ to nil. Which ain’t happenin’.

One comment on the Perl, it will return 0 if no argument is passed at all.
Probably a situation where an error should be raised-- if you have -w on,
at least it will complain about the uninitialized variable in the regex.

How would a good programmer avoid this situation? In Ruby, I prefer
the =~ and $1 regexp syntax (rather than the OO one) because it’s
shorter to write…

The following Ruby code using standad OO is certainly shorter (but only
slightly more readable) than your Perl code for the same function (even if
you remove the “unnecessary” return and just end with length($1); ) and
would have helped you catch the bug in your regex right away. As an added
bonus, this sort of code is thread-safe.

def _num_quotes(x)
/^(>*)/.match(x)[1].size
end

Perl’s (and Ruby’s) $digits are globals. Good programmers avoid globals (or
at least that’s what they tell me ).

Of course, you can rewrite the Perl to be shorter and get rid of the $1 at
the same time (though, I have no idea if it is using the globals behind
the scenes):

sub num_quotes {
length(( $
[0] =~ /^(>+)/ )[0]);
}

(note: the ruby code is still shorter – i love ruby!)

-michael

···

On Friday 09 August 2002 20:07, Philip Mak wrote:

++++++++++++++++++++++++++++++++++++++++++
Michael C. Libby x@ichimunki.com
public key: http://www.ichimunki.com/public_key.txt
web site: http://www.ichimunki.com
++++++++++++++++++++++++++++++++++++++++++

Hi,

sub num_quotes {
$
[0] =~ /^(>+)/;
return length($1);
}

If you were really made a subroutine, you must not meet the

problem.

See the problem in the above subroutine? (It’s Perl, but the $1
problem applies to Ruby too.) The subroutine is meant to return the
number of ‘>’ marks at the beginning of a line. However, I
(mistakenly) wrote a regular expression that will fail to match if
there are no ‘>’ marks. If it fails to match, I end up returning the
length of an uninitialized variable. This slipped past my unit tests
since in the tests, $1 would always start out undefined. The simple
bug of using /^(>+)/ instead of /^(>*)/ took me a while to track down.

Recent versions clear $1 and so on when match failed.

$ cat a.rb
NUM_QUOTES = /^(>+)/
NUM_QUOTES =~ “>>>a”
p $1
NUM_QUOTES =~ “a>>>”
p $1
$ ruby-1.6 -v a.rb
ruby 1.6.7 (2002-08-01) [i686-linux]
“>>>”
nil

···

At Sat, 10 Aug 2002 10:07:07 +0900, Philip Mak wrote:


Nobu Nakada

For the sake of correctness you would not be calling MatchData#size, but
rather MatchData# to get at the numbered match. The resulting string is
then the receiver for size.

-michael

···

On Friday 09 August 2002 21:05, michael libby wrote:

The advantage to the OO syntax is that a non-match would raise an error
when you tried to call MatchData#size, because you would be trying to
send ‘size’ to nil. Which ain’t happenin’.

++++++++++++++++++++++++++++++++++++++++++
Michael C. Libby x@ichimunki.com
public key: http://www.ichimunki.com/public_key.txt
web site: http://www.ichimunki.com
++++++++++++++++++++++++++++++++++++++++++

Hi,

···

In message “Re: The problem with using $1 in regexps” on 02/08/10, michael libby x@ichimunki.com writes:

Perl’s (and Ruby’s) $digits are globals. Good programmers avoid globals (or
at least that’s what they tell me ).

Ruby’s $digits are not globals, despite of their appearance. They
are (and $~ too) local to the current scope. Good programmers avoid
using $digits probably because they are ugly.

						matz.

Good programmers use $1 etc when porting Perl to Ruby. They’re not that
ugly. But they are ugly enough to switch to more pure-Ruby once the program
works.

Gavin

···

----- Original Message -----
From: “Yukihiro Matsumoto” matz@ruby-lang.org
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Sunday, August 11, 2002 1:48 AM
Subject: Re: The problem with using $1 in regexps

Hi,

In message “Re: The problem with using $1 in regexps” > on 02/08/10, michael libby x@ichimunki.com writes:

Perl’s (and Ruby’s) $digits are globals. Good programmers avoid globals (or
at least that’s what they tell me ).

Ruby’s $digits are not globals, despite of their appearance. They
are (and $~ too) local to the current scope. Good programmers avoid
using $digits probably because they are ugly.

matz.