Escaped backslashes in input strings - newbie question

I am trying to find a way of removing escaped characters in input strings from a file made by another program. That is to say, two-character sequences in which the first character is a backslash.

I would have thought in my naivete that gsub(/\\./,"") would do this, but no. I am using Ruby 1.9 hence Oniguruma.

Firstly, are the strings input from a text file treated as single- or double-quoted?

Secondly, are there alternatives to gsub?

Thirdly, are there any *clear* and *exhaustive* treatments of the question of escaped backslashes in Ruby - in input strings in programs, in IRB, with single-quoted strings, double-quoted strings and other situations? I guess a multidimensional table might be useful to determine how to escape a backslash, if it can be done at all.

Regards

John Sampson

Check out Regexp.escape ->

Regexp.escape( '\foo\bar' )

=> "\\\\foo\\\\bar"

Regexp.escape( '\foo\bar' ).gsub /\\\\/,''

=> "foobar"

hope this helps.

regards
attila

···

On Wed, Jan 16, 2013 at 1:58 PM, John Sampson <jrs.idx@ntlworld.com> wrote:

I am trying to find a way of removing escaped characters in input strings
from a file made by another program. That is to say, two-character sequences
in which the first character is a backslash.

I would have thought in my naivete that gsub(/\\./,"") would do this, but
no. I am using Ruby 1.9 hence Oniguruma.

Firstly, are the strings input from a text file treated as single- or
double-quoted?

Secondly, are there alternatives to gsub?

Thirdly, are there any *clear* and *exhaustive* treatments of the question
of escaped backslashes in Ruby - in input strings in programs, in IRB, with
single-quoted strings, double-quoted strings and other situations? I guess a
multidimensional table might be useful to determine how to escape a
backslash, if it can be done at all.

Regards

John Sampson

I am trying to find a way of removing escaped characters in input strings
from a file made by another program. That is to say, two-character sequences
in which the first character is a backslash.

I would have thought in my naivete that gsub(/\\./,"") would do this, but
no. I am using Ruby 1.9 hence Oniguruma.

Why not? This works as expected for me:

irb(main):001:0> s = 'a\\bc'
=> "a\\bc"
irb(main):002:0> puts s, s.length
a\bc
4
=> nil
irb(main):003:0> x = s.gsub(/\\./, '')
=> "ac"
irb(main):004:0> puts x, x.length
ac
2
=> nil
irb(main):005:0>

Firstly, are the strings input from a text file treated as single- or
double-quoted?

Neither. Single and double quotes only have meaning inside program code.

Secondly, are there alternatives to gsub?

Well, you can code it up yourself. You can use

irb(main):006:0> s.delete '\\'
=> "abc"

Thirdly, are there any *clear* and *exhaustive* treatments of the question
of escaped backslashes in Ruby - in input strings in programs, in IRB, with
single-quoted strings, double-quoted strings and other situations? I guess a
multidimensional table might be useful to determine how to escape a
backslash, if it can be done at all.

The topic comes up frequently here. Other than that:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_stdtypes.html#S2

Kind regards

robert

···

On Wed, Jan 16, 2013 at 1:58 PM, John Sampson <jrs.idx@ntlworld.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Firstly, gsub does not make the removal of characters permanent, try gsub!
Secondly, if you control the other program why not from there put the input into your own designed string delimiters instead of single and double quotes doing away with the need to use escape backslashes within the string?

···

Date: Wed, 16 Jan 2013 21:58:13 +0900
From: jrs.idx@ntlworld.com
Subject: Escaped backslashes in input strings - newbie question
To: ruby-talk@ruby-lang.org

I am trying to find a way of removing escaped characters in input
strings from a file made by another program. That is to say,
two-character sequences in which the first character is a backslash.

I would have thought in my naivete that gsub(/\\./,"") would do this,
but no. I am using Ruby 1.9 hence Oniguruma.

Firstly, are the strings input from a text file treated as single- or
double-quoted?

Secondly, are there alternatives to gsub?

Thirdly, are there any *clear* and *exhaustive* treatments of the
question of escaped backslashes in Ruby - in input strings in programs,
in IRB, with single-quoted strings, double-quoted strings and other
situations? I guess a multidimensional table might be useful to
determine how to escape a backslash, if it can be done at all.

Regards

John Sampson

I should have mentioned that I was using gsub!

It does look as if I will need a non-Ruby program to massage the file input.

···

On 16/01/2013 13:42, Alexander McMillan wrote:

Firstly, gsub does not make the removal of characters permanent, try gsub!
Secondly, if you control the other program why not from there put the input into your own designed string delimiters instead of single and double quotes doing away with the need to use escape backslashes within the string?

> Date: Wed, 16 Jan 2013 21:58:13 +0900
> From: jrs.idx@ntlworld.com
> Subject: Escaped backslashes in input strings - newbie question
> To: ruby-talk@ruby-lang.org
>
> I am trying to find a way of removing escaped characters in input
> strings from a file made by another program. That is to say,
> two-character sequences in which the first character is a backslash.
>
> I would have thought in my naivete that gsub(/\\./,"") would do this,
> but no. I am using Ruby 1.9 hence Oniguruma.
>
> Firstly, are the strings input from a text file treated as single- or
> double-quoted?
>
> Secondly, are there alternatives to gsub?
>
> Thirdly, are there any *clear* and *exhaustive* treatments of the
> question of escaped backslashes in Ruby - in input strings in programs,
> in IRB, with single-quoted strings, double-quoted strings and other
> situations? I guess a multidimensional table might be useful to
> determine how to escape a backslash, if it can be done at all.
>
> Regards
>
> John Sampson
>

In Icon a solution looks like this:

···

########################################################
#
# Procedure to remove escaped characters from
# CINDEX DAT file
#
########################################################
#
# Scan string to find each "\" character.
# Remove it and the following character.
#
########################################################

procedure remesc(s)
     local outputstring
     outputstring := "" # initialise
     s ? {
         while outputstring ||:= tab(upto('\\')) do move(2)
         outputstring ||:= tab(0)
     }
     return outputstring
end

Earlier today I wrote:
> I am trying to find a way of removing escaped characters in input
> strings from a file made by another program. That is to say,
> two-character sequences in which the first character is a backslash.

I don't think so. Instead of providing a sample in a language I don't
know can you provide input and desired output? I still don't see
what's not working with gsub or gsub! here at all.

Kind regards

robert

···

On Wed, Jan 16, 2013 at 3:52 PM, John Sampson <jrs.idx@ntlworld.com> wrote:

I should have mentioned that I was using gsub!

It does look as if I will need a non-Ruby program to massage the file input.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Input would be, for example, as (part of) a line in a text file, "\u99m\UTc". The desired output in this case would be "99mTc".

The objective is to delete the "\u" and the "\U" and any other single letters where they are preceded by a backslash.

Regards

John

···

On 16/01/2013 17:28, Robert Klemme wrote:

On Wed, Jan 16, 2013 at 3:52 PM, John Sampson <jrs.idx@ntlworld.com> wrote:

I should have mentioned that I was using gsub!

It does look as if I will need a non-Ruby program to massage the file input.

I don't think so. Instead of providing a sample in a language I don't
know can you provide input and desired output? I still don't see
what's not working with gsub or gsub! here at all.

Kind regards

robert

OK, understood so far. This is what you wrote right from the start.
But what's actually not working?

robert

···

On Wed, Jan 16, 2013 at 6:55 PM, John Sampson <jrs.idx@ntlworld.com> wrote:

Input would be, for example, as (part of) a line in a text file,
"\u99m\UTc". The desired output in this case would be "99mTc".

The objective is to delete the "\u" and the "\U" and any other single
letters where they are preceded by a backslash.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

If, at IRB, I write: x = "\b99m\BTc" (not \u which gives an 'invalid Unicode escape')
and then: a.gsub(/\\./,"")
the answer is "\b99mBTc".

I am trying to get "99mTc", which I do get provided that I escape "\b" and "\B" with
extra backslashes, but in file input these extra backslashes will be absent. I think on
investigation that the extra backslashes will not be needed in that, but not necessarily
any other, context.

I am doing this by experiment, but would rather be able to read a clear exposition on
the subject of escapes in the different contexts of file input, IRB, double quoted strings,
single quoted strings, Oniguruma and any other contexts affecting the number of backslashes needed.
The fact that the ruby-talk archive is full of posts on backslash escapes must be a measure
of the widespread confusion on this subject.

As an aside, I have trouble with the search engine at the ruby-talk archive - it does not
seem to recognise "and" as a Boolean operator.

Regards

John S

···

On 16/01/2013 21:53, Robert Klemme wrote:

On Wed, Jan 16, 2013 at 6:55 PM, John Sampson <jrs.idx@ntlworld.com> wrote:

Input would be, for example, as (part of) a line in a text file,
"\u99m\UTc". The desired output in this case would be "99mTc".

The objective is to delete the "\u" and the "\U" and any other single
letters where they are preceded by a backslash.

OK, understood so far. This is what you wrote right from the start.
But what's actually not working?

robert

Quoting John Sampson (jrs.idx@ntlworld.com):

If, at IRB, I write: x = "\b99m\BTc" (not \u which gives an 'invalid
Unicode escape')
and then: a.gsub(/\\./,"")
the answer is "\b99mBTc".

That's because, in order to obtain the string you want to obtain, you
have to type

x = "\\b99m\\BTc"

(note the double backslashes). If you then type

p x.length

you obtain 9. And if you type

p x.chars

you obtain

=> ["\\", "b", "9", "9", "m", "\\", "B", "T", "c"]

The backslash is ONE character, but it is REPRESENTED as two
backslashes.

The string you typed instead gives:

=> ["\b", "9", "9", "m", "B", "T", "c"]

\b is the way to represent a backspace. The other backslash disappears
because "\B" is not a special backslash sequence, and the non-escaped
character is returned.

One of the places where you can find a list of the valid backslash
sequence is this:

http://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals

(search for 'Backslash Notation') If you ponder on all this jumble
with calm, you will see there is a sound logic behind it. If you think
this is messy, wait until you meet internationalization!

Carlo

···

Subject: Escaped backslashes in input strings - newbie question
  Date: Thu 17 Jan 13 06:05:06PM +0900

--
  * Se la Strada e la sua Virtu' non fossero state messe da parte,
* K * Carlo E. Prelz - fluido@fluido.as che bisogno ci sarebbe
  * di parlare tanto di amore e di rettitudine? (Chuang-Tzu)

Input would be, for example, as (part of) a line in a text file,
"\u99m\UTc". The desired output in this case would be "99mTc".

The objective is to delete the "\u" and the "\U" and any other single
letters where they are preceded by a backslash.

OK, understood so far. This is what you wrote right from the start.
But what's actually not working?

robert

If, at IRB, I write: x = "\b99m\BTc" (not \u which gives an 'invalid Unicode
escape')
and then: a.gsub(/\\./,"")
the answer is "\b99mBTc".

The problem is that what you wrote is not what you meant. In Ruby
code: "\b" is a single character:

1.9.2p290 :002 > "\b".length
=> 1

To have the same thing you will have when reading from a file, you
should escape the \ in string literals:

1.9.2p290 :003 > "\\b".length
=> 2

So,

1.9.2p290 :005 > "\\b1234\\c5678".gsub(/\\./, "")
=> "12345678"

I am trying to get "99mTc", which I do get provided that I escape "\b" and
"\B" with
extra backslashes, but in file input these extra backslashes will be absent.
I think on
investigation that the extra backslashes will not be needed in that, but not
necessarily
any other, context.

Exactly, source code is different from a file that you read.

Jesus.

···

On Thu, Jan 17, 2013 at 10:05 AM, John Sampson <jrs.idx@ntlworld.com> wrote:

On 16/01/2013 21:53, Robert Klemme wrote:

On Wed, Jan 16, 2013 at 6:55 PM, John Sampson <jrs.idx@ntlworld.com> >> wrote:

Which is because the string is being interpreted due to being
wrapped in double quotes; on the other hand,

1.9.3 (main):0 > '\u99m\UTc'.gsub(/\\./,"")
=> "99mTc"
1.9.3 (main):0 >

seems to work just fine :slight_smile:

···

On Thu, Jan 17, 2013 at 1:23 AM, Carlo E. Prelz <fluido@fluido.as> wrote:

If, at IRB, I write: x = "\b99m\BTc" (not \u which gives an 'invalid
Unicode escape')

--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com

twitter: @hassan