Finding "\" in a string

Eric_Armstrong · 28 July 2006 20:11

I'm going crazy, right? Surely it is possible
to find a backslash (\) in a string, and
replace it with two backslashes. That's simple,
right? Just gsub("\\", '\\'). That's all there
is to it.

Except that...

#!/usr/bin/env ruby

testpath = "\\foo\\bar"
p testpath # => "\\foo\\bar" --representation
puts testpath # => \foo\bar --what you see

result = testpath.gsub("\\", '\\')
p result # => "\\foo\\bar"
(represenation unchanged. Should be "\\\\foo\\\\bar")

puts result # => \foo\bar
(unchanged. should be \\foo\\bar)

I've tried lots of gsub variations, like
/\\/, '\\', "\\\\", and anything else I could
think of that came even close to being sensible,
but I've yet to find anything that works.

My workaround will be to add a sed script to
filter the input. But surely that isn't necessary.
Is it?

Arnaud_Bergeron · 28 July 2006 20:21

I'm going crazy, right? Surely it is possible
to find a backslash (\) in a string, and
replace it with two backslashes. That's simple,
right? Just gsub("\\", '\\'). That's all there
is to it.

puts '\\'

\
=> nil

You are replacing single slashes with sigle slashes.

···

On 7/28/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

Except that...

#!/usr/bin/env ruby

testpath = "\\foo\\bar"
p testpath # => "\\foo\\bar" --representation
puts testpath # => \foo\bar --what you see

result = testpath.gsub("\\", '\\')
p result # => "\\foo\\bar"
   (represenation unchanged. Should be "\\\\foo\\\\bar")

puts result # => \foo\bar
   (unchanged. should be \\foo\\bar)

I've tried lots of gsub variations, like
    /\\/, '\\', "\\\\", and anything else I could
think of that came even close to being sensible,
but I've yet to find anything that works.

My workaround will be to add a sed script to
filter the input. But surely that isn't necessary.
Is it?

--
"i think we should rewrite the kernel in java since it has good
support for threads." - Ted Unangst

Paul_Battley · 28 July 2006 21:52

You're not going crazy. You just need eight of 'em. Yes, really!

testpath = "\\foo\\bar"
result = testpath.gsub("\\", "\\\\\\\\")
p result # => "\\\\foo\\\\bar"

Paul.

···

On 28/07/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

I'm going crazy, right? Surely it is possible
to find a backslash (\) in a string, and
replace it with two backslashes. That's simple,
right? Just gsub("\\", '\\'). That's all there
is to it.

W_James · 29 July 2006 00:00

Eric Armstrong wrote:

I'm going crazy, right? Surely it is possible
to find a backslash (\) in a string, and
replace it with two backslashes. That's simple,
right? Just gsub("\\", '\\'). That's all there
is to it.

Except that...

#!/usr/bin/env ruby

testpath = "\\foo\\bar"
p testpath # => "\\foo\\bar" --representation
puts testpath # => \foo\bar --what you see

result = testpath.gsub("\\", '\\')
p result # => "\\foo\\bar"
   (represenation unchanged. Should be "\\\\foo\\\\bar")

puts result # => \foo\bar
   (unchanged. should be \\foo\\bar)

I've tried lots of gsub variations, like
    /\\/, '\\', "\\\\", and anything else I could
think of that came even close to being sensible,
but I've yet to find anything that works.

My workaround will be to add a sed script to
filter the input. But surely that isn't necessary.
Is it?

puts "\\foo\\bar".gsub(/\\/, '\&\&' )

\\foo\\bar

Eric_Armstrong · 28 July 2006 21:50

Arnaud Bergeron wrote:

I'm going crazy, right? Surely it is possible
to find a backslash (\) in a string, and
replace it with two backslashes. That's simple,
right? Just gsub("\\", '\\'). That's all there
is to it.

> puts '\\'
\
=> nil

You are replacing single slashes with single slashes.

You would think so, wouldn't you? But note that
I've used single quotes for the replacement.
So it /should/ equate to two backslashes.

So I tried four of them in the replacement string.
I did so yesterday, in fact, and repeated the
experiment today:

   testpath = "\\foo\\bar"
   result = testpath.gsub("\\", '\\\\')
   p result # => "\\foo\\bar" (s/b "\\\\foo\\\\bar"
   puts result # => \foo\bar (s/b \\foo\\bar)

Notice that the results are unchanged. So I tried a
simple character: testpath.gsub("\\", 'r')

That worked as expected:

   "\\foo\\bar"
   \foo\bar
   "rfoorbar"
   rfoorbar

So it would appear that something strange is happening
in the evaluation of the gsub replacement string.

···

On 7/28/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

Except that...

#!/usr/bin/env ruby

testpath = "\\foo\\bar"
p testpath # => "\\foo\\bar" --representation
puts testpath # => \foo\bar --what you see

result = testpath.gsub("\\", '\\')
p result # => "\\foo\\bar"
   (represenation unchanged. Should be "\\\\foo\\\\bar")

puts result # => \foo\bar
   (unchanged. should be \\foo\\bar)

I've tried lots of gsub variations, like
    /\\/, '\\', "\\\\", and anything else I could
think of that came even close to being sensible,
but I've yet to find anything that works.

My workaround will be to add a sed script to
filter the input. But surely that isn't necessary.
Is it?

Paul_Battley · 29 July 2006 00:09

Nice solution.

···

On 29/07/06, William James <w_a_x_man@yahoo.com> wrote:

>> puts "\\foo\\bar".gsub(/\\/, '\&\&' )
\\foo\\bar

Paul_Battley · 28 July 2006 21:57

That's right. The replacement string understands back-references of
the form \1, \2 etc, so there's another level of escaping needed.

"\\" <- one backslash
"\\\\" <- two backslashes
"\\\\\\\\" <- four backslashes, or two backslashes in replacement string

Paul.

···

On 28/07/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

So it would appear that something strange is happening
in the evaluation of the gsub replacement string.

Eric_Armstrong · 29 July 2006 00:28

Paul Battley wrote:

···

On 29/07/06, William James <w_a_x_man@yahoo.com> wrote:

>> puts "\\foo\\bar".gsub(/\\/, '\&\&' )
\\foo\\bar

Nice solution.

I'll say! Wild! What makes /that/ work???

Eric_Armstrong · 28 July 2006 22:49

Paul Battley wrote:

So it would appear that something strange is happening
in the evaluation of the gsub replacement string.

That's right. The replacement string understands back-references of
the form \1, \2 etc, so there's another level of escaping needed.

"\\" <- one backslash
"\\\\" <- two backslashes
"\\\\\\\\" <- four backslashes, or two backslashes in replacement string

Very helpful. At least now I have a good idea that it
can be made to work. Now let's see if I can dial in my understanding with respect to replacement strings...

The fact that \1, \2 are recognized /shouldn't/ matter,
because the tokenizer really ought to ignore any \x
sequence where the x isn't a known special character.
But it's rapidly becoming apparent that the tokenizer
takes any \x and translates it to /something/, so we
need \\ to get one \ in the string.

Now then, we need two backslashes, so four of them
/should/ have worked inside single quotes. (And if
if takes 4 to make 1, why wasn't I effectively
removing the existing quotes when I only supplied two.)

I've seen a similar situation in Java, where it took
2 slashes to get one past the language parser, and
it also took two to get one past the regular expression
parser, so in that case it took four slashes to get
one into the regular expression. That's the most
complex case I've seen before this, but note that it
was the _regular expression_ that needed four to get one.
It wasn't the replacement string. The replacement string
shouldn't be going through the reg exp parser, so it
shouldn't require that many.

It seems that it does, though. Huh?

···

On 28/07/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

W_James · 29 July 2006 00:50

Eric Armstrong wrote:

Paul Battley wrote:
>> >> puts "\\foo\\bar".gsub(/\\/, '\&\&' )
>> \\foo\\bar
>
> Nice solution.
>
I'll say! Wild! What makes /that/ work???

\& is simply what was matched by the regex.

It's similar in awk:

BEGIN {
  s = " foo "
  sub(/[^ ]+/, "&&&", s)
  print s
}

===> foofoofoo

···

> On 29/07/06, William James <w_a_x_man@yahoo.com> wrote:

Paul_Battley · 28 July 2006 23:10

In fact, the tokenizer *does* ignore non-special sequences:

"\\foo".gsub("\\", "\\x") # => "\\xfoo"

And if there's nothing else left in the string, that works, too:

"\\foo".gsub("\\", "\\") # => "\\foo"

However, in order to be able to produce a literal \1, it also needs to
understand \\. Therefore, this gives the same effect:

"\\foo".gsub("\\", "\\\\") # => "\\foo"

But I missed a trick earler. We only actually need six backslashes to
emit two: the end of the string will terminate the sequence as in the
second example:

"\\foo".gsub("\\", "\\\\\\") # => "\\\\foo"

Is it getting clearer?

Paul.

···

On 28/07/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

Very helpful. At least now I have a good idea that it
can be made to work. Now let's see if I can dial in my
understanding with respect to replacement strings...

The fact that \1, \2 are recognized /shouldn't/ matter,
because the tokenizer really ought to ignore any \x
sequence where the x isn't a known special character.
But it's rapidly becoming apparent that the tokenizer
takes any \x and translates it to /something/, so we
need \\ to get one \ in the string.

Eric_Armstrong · 29 July 2006 19:55

William James wrote:

Eric Armstrong wrote:

Paul Battley wrote:

puts "\\foo\\bar".gsub(/\\/, '\&\&' )

\\foo\\bar

Nice solution.

I'll say! Wild! What makes /that/ work???

\& is simply what was matched by the regex.

Too cool. I love it.
Thanks.

Another instructive thread.
What a great list.
:_)

···

On 29/07/06, William James <w_a_x_man@yahoo.com> wrote:

Logan_Capaldo · 28 July 2006 23:18

This is why when I need to gsub backslashes I use a block:

gsub(%r{\}) { '\\' }

···

On Jul 28, 2006, at 7:10 PM, Paul Battley wrote:

On 28/07/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

Very helpful. At least now I have a good idea that it
can be made to work. Now let's see if I can dial in my
understanding with respect to replacement strings...

The fact that \1, \2 are recognized /shouldn't/ matter,
because the tokenizer really ought to ignore any \x
sequence where the x isn't a known special character.
But it's rapidly becoming apparent that the tokenizer
takes any \x and translates it to /something/, so we
need \\ to get one \ in the string.

In fact, the tokenizer *does* ignore non-special sequences:

"\\foo".gsub("\\", "\\x") # => "\\xfoo"

And if there's nothing else left in the string, that works, too:

"\\foo".gsub("\\", "\\") # => "\\foo"

However, in order to be able to produce a literal \1, it also needs to
understand \\. Therefore, this gives the same effect:

"\\foo".gsub("\\", "\\\\") # => "\\foo"

But I missed a trick earler. We only actually need six backslashes to
emit two: the end of the string will terminate the sequence as in the
second example:

"\\foo".gsub("\\", "\\\\\\") # => "\\\\foo"

Is it getting clearer?

Paul.

Paul_Battley · 28 July 2006 23:25

That's much easier, but significantly slower:

s = "\\foo" * 1000000

t0 = Time.now
s.gsub("\\", "\\\\\\")
p(Time.now - t0) # 1.394492

t0 = Time.now
s.gsub("\\"){ "\\\\" }
p(Time.now - t0) # 2.863728

If you are dealing with very long strings, it's probably worth the
effort of ciphering the backslashes.

Paul.

···

On 29/07/06, Logan Capaldo <logancapaldo@gmail.com> wrote:

This is why when I need to gsub backslashes I use a block:

gsub(%r{\}) { '\\' }

Eric_Armstrong · 28 July 2006 23:30

Omigawd. I'm more confused than ever! But
I have some great patterns to follow until
the day of my enlightenment arrives!

Thanks guys. I look forward to the day this
all makes sense...
:_)

Logan Capaldo wrote:

···

On Jul 28, 2006, at 7:10 PM, Paul Battley wrote:

On 28/07/06, Eric Armstrong <Eric.Armstrong@sun.com> wrote:

Very helpful. At least now I have a good idea that it
can be made to work. Now let's see if I can dial in my
understanding with respect to replacement strings...

The fact that \1, \2 are recognized /shouldn't/ matter,
because the tokenizer really ought to ignore any \x
sequence where the x isn't a known special character.
But it's rapidly becoming apparent that the tokenizer
takes any \x and translates it to /something/, so we
need \\ to get one \ in the string.

In fact, the tokenizer *does* ignore non-special sequences:

"\\foo".gsub("\\", "\\x") # => "\\xfoo"

And if there's nothing else left in the string, that works, too:

"\\foo".gsub("\\", "\\") # => "\\foo"

However, in order to be able to produce a literal \1, it also needs to
understand \\. Therefore, this gives the same effect:

"\\foo".gsub("\\", "\\\\") # => "\\foo"

But I missed a trick earler. We only actually need six backslashes to
emit two: the end of the string will terminate the sequence as in the
second example:

"\\foo".gsub("\\", "\\\\\\") # => "\\\\foo"

Is it getting clearer?

Paul.

This is why when I need to gsub backslashes I use a block:

gsub(%r{\}) { '\\' }

Eric_Armstrong · 28 July 2006 23:51

Logan Capaldo wrote:

when I need to gsub backslashes I use a block:

gsub(%r{\}) { '\\' }

That's odd. I would have expected that to work,
since the value of the block is the value of
the string.

But %r{\} produces:
"unterminated string meets end of file"

And { '\\' } inserts a single backslash.

Paul's version worked:
gsub("\\"){ "\\\\" }

As did this:
gsub(%r{\\}) { '\\\\' }

Thanks for all the choices! I'm picking the
version with the simplest syntax I'm likely
to recall:
gsub("\\") { '\\\\' }

It may be slower, but it's readable.

Thanks again, guys. There is no way on earth
I would found the solution. Heck, even now
that I see it, I don't really understand it.
(I'm hoping that Paul's in-depth explanation
will make more sense after I've read it 5 or
6 times.)
:_)

Logan_Capaldo · 28 July 2006 23:50

I'm sure it is. I'll learn how to decipher backslashes in gsub when I come across those strings <g>

···

On Jul 28, 2006, at 7:25 PM, Paul Battley wrote:

On 29/07/06, Logan Capaldo <logancapaldo@gmail.com> wrote:

This is why when I need to gsub backslashes I use a block:

gsub(%r{\}) { '\\' }

That's much easier, but significantly slower:

s = "\\foo" * 1000000

t0 = Time.now
s.gsub("\\", "\\\\\\")
p(Time.now - t0) # 1.394492

t0 = Time.now
s.gsub("\\"){ "\\\\" }
p(Time.now - t0) # 2.863728

If you are dealing with very long strings, it's probably worth the
effort of ciphering the backslashes.

Paul.

Logan_Capaldo · 28 July 2006 23:56

Logan Capaldo wrote:

when I need to gsub backslashes I use a block:
gsub(%r{\}) { '\\' }

That's odd. I would have expected that to work,
since the value of the block is the value of
the string.

But %r{\} produces:
  "unterminated string meets end of file"

And { '\\' } inserts a single backslash.

Paul's version worked:
  gsub("\\"){ "\\\\" }

As did this:
  gsub(%r{\\}) { '\\\\' }

Thanks for all the choices! I'm picking the
version with the simplest syntax I'm likely
to recall:
  gsub("\\") { '\\\\' }

It may be slower, but it's readable.

Thanks again, guys. There is no way on earth
I would found the solution. Heck, even now
that I see it, I don't really understand it.
(I'm hoping that Paul's in-depth explanation
will make more sense after I've read it 5 or
6 times.)
:_)

Sorry, that wasn't meant to be a _real_ example. It was just the idea of throwing the block out there so you didn't have to deal with the very domain specific logic of gsub's second argument.

···

On Jul 28, 2006, at 7:51 PM, Eric Armstrong wrote:

Eric_Armstrong · 29 July 2006 00:17

Logan Capaldo wrote:

Sorry, that wasn't meant to be a _real_ example. It was just the idea of throwing the block out there so you didn't have to deal with the very domain specific logic of gsub's second argument.

No problem. If I had sufficient understanding, I would
simply have filled in the blanks. Going through the
exercise gave me a few more data points, so I have a
better handle on what works--although I'm still mystified
as to why.

At any rate, I appreciate the block trick. It's something
I'm beginning to make more use of.

Topic		Replies	Views
Gsub and backslashes ruby-talk	4	81	21 January 2007
Problem with String.gsub and \' as a replacement string ruby-talk	4	124	17 November 2007
Backslash substitution question ruby-talk	7	82	11 November 2003
String#gsub escaping special characters ruby-talk	5	137	24 February 2009
Regex problem ruby-talk	3	102	1 October 2002

Finding "\" in a string

Related topics