Bug in gsub(?)

Tiziano_Merzi · 25 September 2010 18:29

I have found this bug(?) in gsub

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\ \1')
=> \ \\ :\ {\ }\ =\ #\ ~ OK

but

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\1')
=> \1\1\1\1\1\1\1

Any idea?

···

--
Posted via http://www.ruby-forum.com/.

Brian_Candler · 25 September 2010 22:08

Tiziano Merzi wrote:

I have found this bug(?) in gsub

http://www.catb.org/~esr/faqs/smart-questions.html#id382249

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\ \1')
=> \ \\ :\ {\ }\ =\ #\ ~ OK

but

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\1')
=> \1\1\1\1\1\1\1

Any idea?

puts "a".gsub(/a/, '\\\\') # i.e. two backslashes
=> \

That is, in a replacement string, if you backslash-escape a backslash
you get a single backslash. That allows you to have literally \1 if
that's what you need.

So a literal backslash is \\, and the first capture is \1

So what you want is \\\1, to get a backslash followed by the first
capture. However, that is represented in a string literal as '\\\\\\1'
(which generates a 4 character string) because a string literal also has
backslash escaping.

'\\\\\\1'.size

=> 4

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\\\\1')

\\\:\{\}\=\#\~
=> nil

Take a suggestion from me: save your sanity and use the block form
instead

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }

\\\:\{\}\=\#\~
=> nil

···

--
Posted via http://www.ruby-forum.com/\.

Chad_Perrin · 25 September 2010 23:34

I've wondered for quite a while what was the rationale for having \1 in
the first place.

···

On Sun, Sep 26, 2010 at 07:08:13AM +0900, Brian Candler wrote:

Take a suggestion from me: save your sanity and use the block form
instead

>> puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }
\\\:\{\}\=\#\~
=> nil

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

Tiziano_Merzi · 26 September 2010 06:57

Brian Candler wrote:

That is, in a replacement string, if you backslash-escape a backslash
you get a single backslash. That allows you to have literally \1 if
that's what you need.

So a literal backslash is \\, and the first capture is \1

So what you want is \\\1, to get a backslash followed by the first
capture. However, that is represented in a string literal as '\\\\\\1'
(which generates a 4 character string) because a string literal also has
backslash escaping.

'\\\\\\1'.size

=> 4

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\\\\1')

\\\:\{\}\=\#\~
=> nil

Take a suggestion from me: save your sanity and use the block form
instead

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }

\\\:\{\}\=\#\~
=> nil

ThanksBrian!
I know the block form.
So the problem is the backslash escape in string:
'\\\1' == '\\\\1' => true

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 26 September 2010 06:54

Chad Perrin wrote:

I've wondered for quite a while what was the rationale for having \1 in
the first place.

Ruby inherits a lot from Perl, and Perl from sed.

Some of the Perlisms are IMO superfluous - in particular the Kernel
methods which operate on $_, and the flip-flop conditional operators.

Objects would be much tidier if they didn't inherit Kernel#gets,
Kernel#gsub etc; and you'd avoid some confusing error messages like

irb(main):001:0> 3.gsub(/a/,'b')
NoMethodError: private method `gsub' called for 3:Fixnum

···

--
Posted via http://www.ruby-forum.com/\.

Chad_Perrin · 27 September 2010 01:40

Chad Perrin wrote:
> I've wondered for quite a while what was the rationale for having \1 in
> the first place.

Ruby inherits a lot from Perl, and Perl from sed.

Okay . . . I guess that sorta makes sense. Of course, I've never used \1
in Perl, nor seen anyone else do so either, so until you mentioned it I
had entirely forgotten that was an option there either.

Both languages would be better off without that syntax, and just stick
with $1 instead, I think.

Some of the Perlisms are IMO superfluous - in particular the Kernel
methods which operate on $_, and the flip-flop conditional operators.

I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead. If it's a Perlism despite its lack of
general usage, I'd say it's every bit as much a Rubyism.

···

On Sun, Sep 26, 2010 at 03:54:35PM +0900, Brian Candler wrote:

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

Mike_Stok1 · 27 September 2010 02:03

There are times in Perl when you need to use \1 in the matching part of a regular expression because you don't want $1 to interpolate into the match.

Consider trying to match a simple quoted string (i.e. no \ escaping):

my $s1 = "Hello there";
my $s2 = q{The cat said "Hello there, how's it going?"};

if ($s1 =~ m/(ell)/) {
print "print s1 matched - \$1 is '$1'\n";
}

if ($s2 =~ m/(["'])(.*?)\1/) {
print "print s2 matched - \$2 is '$2'\n";
}

This outputs:

print s1 matched - $1 is 'ell'
print s2 matched - $2 is 'Hello there, how's it going?'

If you try using $1 in place of \1 in the second regex then it will output

print s1 matched - $1 is 'ell'
print s2 matched - $2 is 'H'

Mike

···

On Sep 26, 2010, at 9:40 PM, Chad Perrin wrote:

On Sun, Sep 26, 2010 at 03:54:35PM +0900, Brian Candler wrote:

Chad Perrin wrote:

I've wondered for quite a while what was the rationale for having \1 in
the first place.

Ruby inherits a lot from Perl, and Perl from sed.

Okay . . . I guess that sorta makes sense. Of course, I've never used \1
in Perl, nor seen anyone else do so either, so until you mentioned it I
had entirely forgotten that was an option there either.

Both languages would be better off without that syntax, and just stick
with $1 instead, I think.

Some of the Perlisms are IMO superfluous - in particular the Kernel
methods which operate on $_, and the flip-flop conditional operators.

I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead. If it's a Perlism despite its lack of
general usage, I'd say it's every bit as much a Rubyism.

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Brian_Candler · 27 September 2010 08:30

Chad Perrin wrote:

I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead.

I called \1 a perlism mainly because it's a sedism that perl inherited.
You're right that in Perl you could instead write:

$str =~ s/(.)/$1$1/;

Of course, that doesn't work in Ruby without using the block form:

    str.sub(/(.)/, "$1$1") # no!
    str.sub(/(.)/, "#{$1}#{$1}") # no!!
    str.sub(/(.)/) {"#{$1}#{$1}"} # ok

in which case you could either argue that ruby needs sed's \1 more than
perl does, or you could argue that ruby doesn't need it at all.

It's odd that ruby strives to be so perl-compatible in areas like this,
but is different in far more important areas (e.g. ^ matching newlines
within a string, not just the start of string)

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Xavier_Noria · 27 September 2010 09:08

Absolutely, there are a few gotchas:

http://www.advogato.org/person/fxn/diary/498.html

Don't know why is that way, but I find them surprising.

···

On Mon, Sep 27, 2010 at 10:30 AM, Brian Candler <b.candler@pobox.com> wrote:

It's odd that ruby strives to be so perl-compatible in areas like this,
but is different in far more important areas (e.g. ^ matching newlines
within a string, not just the start of string)

Topic		Replies	Views
Bug in gsub with "\\"? ruby-talk	3	96	17 August 2004
Backslashes ruby-talk	6	85	30 September 2005
Gsub and backslashes ruby-talk	15	1569	23 November 2010
Puts "\\".gsub("\\", "\\\\") ruby-talk	4	68	25 April 2008
Regex problem ruby-talk	3	102	1 October 2002

Bug in gsub(?)

Related topics