Weird behaviour escaping special characters in a string

This instance method added to the String class returns a copy of the
receiver with occurrences of \ replaced with \\, and occurrences of '
replaced with \':

class String
  def to_source_string
    gsub(/(\\|')/, '\\\\\1')
  end
end

The idea is that it will give you a string that you can write out a
Ruby file that will later print the string. For, example, let's take
the string, foo (3 characters):

"puts '" + "foo".to_source_string + "'" # puts 'foo'

Or a string with special characters in it like 'foo' (5 characters,
including enclosing single quotes):

"puts '" + "'foo'".to_source_string + "'" # puts '\'foo\''

My RSpec specs and experimentation in irb confirm that the method
works but I am at a loss to explain one thing:

Why do I need so many backslashes in my replacement expression?

There are five slashes in the replacement expression:

    gsub(/(\\|')/, '\\\\\1')

But I would have thought that three would work:

    gsub(/(\\|')/, '\\\1')

I basically want to replace "whatever is found in the pattern" with a
backslash (\\) followed by "whatever was found" (\1); so that's three
slashes. But with only three slashes Ruby gives me \1foo\1 instead of
\'foo\'. Four slashes produces the same result. Five slashes and
suddenly everything works (funnily enough, six slashes also works).
Two slashes and one slash have no effect (no escaping is performed).

I've got working code so it's not a huge problem, but my curiosity is
piqued. What's going on here that I don't understand?

Cheers,
Greg

class String
    def to_source_string
      gsub(/(\\|')/) { "\\#$1" }
    end
  end

-austin

···

On 2/21/07, Greg Hurrell <greg.hurrell@gmail.com> wrote:

This instance method added to the String class returns a copy of the
receiver with occurrences of \ replaced with \\, and occurrences of '
replaced with \':

class String
  def to_source_string
    gsub(/(\\|')/, '\\\\\1')
  end
end

--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
               * austin@zieglers.ca

Why do I need so many backslashes in my replacement expression?

There are five slashes in the replacement expression:

    gsub(/(\\|')/, '\\\\\1')

But I would have thought that three would work:

    gsub(/(\\|')/, '\\\1')

Because even in single quotes, blackslashes must be doubled; this in turn is
because \' is the way that you insert a single quote within a single-quoted
string.

irb(main):001:0> a='\\'
=> "\\"
irb(main):002:0> a.size
=> 1
irb(main):003:0> b='\''
=> "'"
irb(main):004:0> b.size
=> 1
irb(main):005:0> c='\x'
=> "\\x"
irb(main):006:0> c.size
=> 2

I basically want to replace "whatever is found in the pattern" with a
backslash (\\) followed by "whatever was found" (\1); so that's three
slashes. But with only three slashes Ruby gives me \1foo\1 instead of
\'foo\'. Four slashes produces the same result. Five slashes and
suddenly everything works (funnily enough, six slashes also works).
Two slashes and one slash have no effect (no escaping is performed).

I've got working code so it's not a huge problem, but my curiosity is
piqued. What's going on here that I don't understand?

irb(main):009:0> a='\\\\1'
=> "\\\\1"
irb(main):010:0> a.size
=> 3
irb(main):011:0> a='\\\\\1'
=> "\\\\\\1"
irb(main):012:0> a.size
=> 4
irb(main):013:0> a='\\\\\\1'
=> "\\\\\\1"
irb(main):014:0> a.size
=> 4

In a single-quoted string:
   \' => '
   \\ => \
   \x => \x for all other x

So '...\1' and '...\\1' are identical.

HTH,

Brian.

···

On Thu, Feb 22, 2007 at 02:55:09AM +0900, Greg Hurrell wrote:

Why don't you just use #inspect?

Kind regards

robert

···

2007/2/21, Greg Hurrell <greg.hurrell@gmail.com>:

This instance method added to the String class returns a copy of the
receiver with occurrences of \ replaced with \\, and occurrences of '
replaced with \':

class String
  def to_source_string
    gsub(/(\\|')/, '\\\\\1')
  end
end

The idea is that it will give you a string that you can write out a
Ruby file that will later print the string. For, example, let's take
the string, foo (3 characters):

"puts '" + "foo".to_source_string + "'" # puts 'foo'

Or a string with special characters in it like 'foo' (5 characters,
including enclosing single quotes):

"puts '" + "'foo'".to_source_string + "'" # puts '\'foo\''

It's probably better to use a character class [\\'] instead of alternation (\\|').

James Edward Gray II

···

On Feb 21, 2007, at 12:36 PM, Austin Ziegler wrote:

On 2/21/07, Greg Hurrell <greg.hurrell@gmail.com> wrote:

This instance method added to the String class returns a copy of the
receiver with occurrences of \ replaced with \\, and occurrences of '
replaced with \':

class String
  def to_source_string
    gsub(/(\\|')/, '\\\\\1')
  end
end

class String
   def to_source_string
     gsub(/(\\|')/) { "\\#$1" }
   end
end

Excellent, that explains why I was getting the same results for 3 and
4 slashes, and the same for 5 and 6 slashes.

Cheers,
Greg

···

On 21 feb, 20:50, Brian Candler <B.Cand...@pobox.com> wrote:

In a single-quoted string:
   \' => '
   \\ => \
   \x => \x for all other x

So '...\1' and '...\\1' are identical.

I did some quick and dirty benchmarks and using a character class is a
little bit quicker. Interpolation ("\\#$1") is slower but more
readable. I guess I'll stick with the character class and no
interpolation though.

require 'benchmark'
include Benchmark

bm(6) do |x|
  x.report('alternation') { 100_000.times { "'foo'".gsub(/(\\|')/, '\\\
\\1') } }
  x.report('char class') { 100_000.times { "'foo'".gsub(/[\\']/, '\\\\
\&') } }
  x.report('interpolation') { 100_000.times { "'foo'".gsub(/(\\|')/, "\
\#$1") } }
  x.report('interpolation with char class') { 100_000.times
{ "'foo'".gsub(/[\\']/, "\\#$&") } }
end
            user system total real
alternation 0.450000 0.000000 0.450000 ( 0.452661)
char class 0.390000 0.000000 0.390000 ( 0.396193)
interpolation 0.540000 0.010000 0.550000 ( 0.532106)
interpolation with char class 0.480000 0.000000 0.480000
( 0.485922)

···

On 21 feb, 19:45, James Edward Gray II <j...@grayproductions.net> wrote:

On Feb 21, 2007, at 12:36 PM, Austin Ziegler wrote:

It's probably better to use a character class [\\'] instead of
alternation (\\|').

James Edward Gray II

%q{...} is your friend.

David Vallner

···

On Thu, 22 Feb 2007 13:55:06 +0100, Greg Hurrell <greg.hurrell@gmail.com> wrote:

On 21 feb, 20:50, Brian Candler <B.Cand...@pobox.com> wrote:

In a single-quoted string:
   \' => '
   \\ => \
   \x => \x for all other x

So '...\1' and '...\\1' are identical.

Excellent, that explains why I was getting the same results for 3 and
4 slashes, and the same for 5 and 6 slashes.