Gsub and backslashes

Consider the string
  \1\2\3
that is
  "\\1\\2\\3"

I feel really stupid ... but this simple substitution pattern does not do what I expect.

  "\\1\\2\\3".gsub(/\\/,"\\\\")

What I want is to change single backslashes to double backslashes. The result of the above substitution is "no change"

On the other hand
  "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
does do what I want ... but I am clueless as to why.

Backslashes are tricky. What's happening here is each escaped
backslash "\\" yields one backslash, which affects (escapes) what
comes after it, in this case another escaped backslash that in turn
yields one back slash. In other words, four backslashes yield two
backslashes, which is an escaped backslash (i.e one backslash).

HTH,
Ammar

···

On Sun, Nov 21, 2010 at 12:13 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

Consider the string
\1\2\3
that is
"\\1\\2\\3"

I feel really stupid ... but this simple substitution pattern does not do what I expect.

"\\1\\2\\3".gsub(/\\/,"\\\\")

What I want is to change single backslashes to double backslashes. The result of the above substitution is "no change"

On the other hand
"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
does do what I want ... but I am clueless as to why.

What I want is to change single backslashes to double backslashes. The

result of the above substitution is "no change"

On the other hand
"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
does do what I want ... but I am clueless as to why.

there are many ways,

#1
"\\1\\2\\3".gsub(/(\\)/,"\\1\\1").scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#2
"\\1\\2\\3".gsub(/(\\)/,'\1\1').scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#3
"\\1\\2\\3".gsub(/\\/){"\\\\"}.scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#4
"\\1\\2\\3".gsub(/(\\)/){$1+$1}.scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#1 & #2 samples uses group backreferences, ruby may need second parsing pass
for this feature to work...

#3 & #4 uses code blocks. may not need second pass. backreferences can be
had using $n notation.

best regards -botp

···

On Sun, Nov 21, 2010 at 6:13 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

Ralph Shnelvar wrote in post #962847:

Consider the string
  \1\2\3
that is
  "\\1\\2\\3"

I feel really stupid ... but this simple substitution pattern does not
do what I expect.

  "\\1\\2\\3".gsub(/\\/,"\\\\")

Here you are replacing one backslash with one backslash.

The trouble is, in the *replacement* string, '\1' has a special meaning
(insert the value of the first capture). Because of this, a literal
backslash is backslash-backslash.

So to replace with *two* backslashes you need
backslash-backslash-backslash-backslash. And inside a double or single
quoted string, a single backslash is represented as "\\" or '\\'

irb(main):001:0> "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
=> "\\\\1\\\\2\\\\3"

The second level of backslashing isn't used with the block form, since
if you want to use captured subexpressions you can use #{$1} instead of
\1. Hence as an alternative:

irb(main):002:0> "\\1\\2\\3".gsub(/\\/) { "\\\\" }
=> "\\\\1\\\\2\\\\3"

···

--
Posted via http://www.ruby-forum.com/\.

Consider the string
\1\2\3
that is
"\\1\\2\\3"

I feel really stupid ... but this simple substitution pattern does not do what I expect.

"\\1\\2\\3".gsub(/\\/,"\\\\")

What I want is to change single backslashes to double backslashes. The result of the above substitution is "no change"

On the other hand
"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
does do what I want ... but I am clueless as to why.

Backslashes are tricky. What's happening here is each escaped
backslash "\\" yields one backslash, which affects (escapes) what
comes after it, in this case another escaped backslash that in turn
yields one back slash. In other words, four backslashes yield two
backslashes, which is an escaped backslash (i.e one backslash).

I should have added that you can get the same result with 3
backslashes. So 6 of them will give you two.

"\\1\\2\\3".gsub(/\\/,"\\\\\\").scan /./

=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

Regards,
Ammar

···

On Sun, Nov 21, 2010 at 12:34 AM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Sun, Nov 21, 2010 at 12:13 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

What I want is to change single backslashes to double backslashes. The

result of the above substitution is "no change"

On the other hand
"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
does do what I want ... but I am clueless as to why.

there are many ways,

#1
"\\1\\2\\3".gsub(/(\\)/,"\\1\\1").scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#2
"\\1\\2\\3".gsub(/(\\)/,'\1\1').scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#3
"\\1\\2\\3".gsub(/\\/){"\\\\"}.scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#4
"\\1\\2\\3".gsub(/(\\)/){$1+$1}.scan /./
#=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

#1 & #2 samples uses group backreferences, ruby may need second parsing pass
for this feature to work...

#3 & #4 uses code blocks. may not need second pass. backreferences can be
had using $n notation.

botp's excellent suggestions reminded of another one:

"\\1\\2\\3".gsub(/\\/, '\&\&')

=> "\\\\1\\\\2\\\\3"

Regards,
Ammar

···

On Sun, Nov 21, 2010 at 11:57 AM, botp <botpena@gmail.com> wrote:

On Sun, Nov 21, 2010 at 6:13 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

Ralph Shnelvar wrote in post #962847:

"\\1\\2\\3".gsub(/\\/,"\\\\")

Here you are replacing one backslash with one backslash.

The trouble is, in the *replacement* string, '\1' has a special meaning
(insert the value of the first capture). Because of this, a literal
backslash is backslash-backslash.

That's a keen observation, but the fact that they happen to be
back-references doesn't seem to play a part in this situation.

"\\a\\b\\c".gsub(/\\/,"\\\\")

=> "\\a\\b\\c"

"\\a\\b\\c".gsub(/\\/,"\\\\\\")

=> "\\\\a\\\\b\\\\c"

Regards,
Ammar

···

On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <b.candler@pobox.com> wrote:

The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have to backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.

Additionally people are often confused by the fact that IRB by default
uses #inspect for showing expression values which will display twice
as much backslashes as are present in the string. :slight_smile:

<grumpy>Can we please make a big red sticker and put it on every Ruby
installer and source tar to inform people of this and the local
variable method ambiguity. These two seem to be the issues that pop
up most of the time.</grumpy>

Kind regards

robert

···

On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <b.candler@pobox.com> wrote:

Ralph Shnelvar wrote in post #962847:

"\\1\\2\\3".gsub(/\\/,"\\\\")

Here you are replacing one backslash with one backslash.

The trouble is, in the *replacement* string, '\1' has a special meaning
(insert the value of the first capture). Because of this, a literal
backslash is backslash-backslash.

That's a keen observation, but the fact that they happen to be
back-references doesn't seem to play a part in this situation.

"\\a\\b\\c".gsub(/\\/,"\\\\")

=> "\\a\\b\\c"

"\\a\\b\\c".gsub(/\\/,"\\\\\\")

=> "\\\\a\\\\b\\\\c"

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Actually, 3 backslashes will yield one backslash. The first two result
in one (escaped), and the third one, escaped by the previous escaped
backslash ends up being one. My second example showed this, using 6
backslashes instead of 8. Using 4 backslashes works because the second
pair yields and escaped backslash, but it is not necessary.

Regards,
Ammar

···

On Mon, Nov 22, 2010 at 10:38 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <b.candler@pobox.com> wrote:

Ralph Shnelvar wrote in post #962847:

"\\1\\2\\3".gsub(/\\/,"\\\\")

Here you are replacing one backslash with one backslash.

The trouble is, in the *replacement* string, '\1' has a special meaning
(insert the value of the first capture). Because of this, a literal
backslash is backslash-backslash.

That's a keen observation, but the fact that they happen to be
back-references doesn't seem to play a part in this situation.

"\\a\\b\\c".gsub(/\\/,"\\\\")

=> "\\a\\b\\c"

"\\a\\b\\c".gsub(/\\/,"\\\\\\")

=> "\\\\a\\\\b\\\\c"

The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have to backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.

That does not work reliably under all circumstances though:

irb(main):006:0> "abc".gsub /./, "\\\n"
=> "\\\n\\\n\\\n"
irb(main):007:0> puts("abc".gsub /./, "\\\n")
\
\
\
=> nil
irb(main):008:0> "abc".gsub /./, "\\\\n"
=> "\\n\\n\\n"
irb(main):009:0> puts("abc".gsub /./, "\\\\n")
\n\n\n
=> nil

It is safer to use 4 backslashes. This is the only robust way to do
this even though sometimes you can simply use a single backslash (e.g.
\1 instead of \\1) because string parsing is a bit tolerant under some
circumstances:

irb(main):014:0> '\1'
=> "\\1"
irb(main):015:0> '\\1'
=> "\\1"

but

irb(main):019:0> "\n"
=> "\n"
irb(main):020:0> "\\n"
=> "\\n"
irb(main):021:0> "\1"
=> "\x01"
irb(main):022:0> "\\1"
=> "\\1"

Kind regards

robert

···

On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Mon, Nov 22, 2010 at 10:38 AM, Robert Klemme > <shortcutter@googlemail.com> wrote:

On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <b.candler@pobox.com> wrote:

Ralph Shnelvar wrote in post #962847:

"\\1\\2\\3".gsub(/\\/,"\\\\")

Here you are replacing one backslash with one backslash.

The trouble is, in the *replacement* string, '\1' has a special meaning
(insert the value of the first capture). Because of this, a literal
backslash is backslash-backslash.

That's a keen observation, but the fact that they happen to be
back-references doesn't seem to play a part in this situation.

"\\a\\b\\c".gsub(/\\/,"\\\\")

=> "\\a\\b\\c"

"\\a\\b\\c".gsub(/\\/,"\\\\\\")

=> "\\\\a\\\\b\\\\c"

The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have to backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.

Actually, 3 backslashes will yield one backslash. The first two result
in one (escaped), and the third one, escaped by the previous escaped
backslash ends up being one. My second example showed this, using 6
backslashes instead of 8. Using 4 backslashes works because the second
pair yields and escaped backslash, but it is not necessary.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Actually, 3 backslashes will yield one backslash. The first two result
in one (escaped), and the third one, escaped by the previous escaped
backslash ends up being one. My second example showed this, using 6
backslashes instead of 8. Using 4 backslashes works because the second
pair yields and escaped backslash, but it is not necessary.

That does not work reliably under all circumstances though:

irb(main):006:0> "abc".gsub /./, "\\\n"
=> "\\\n\\\n\\\n"
irb(main):007:0> puts("abc".gsub /./, "\\\n")
\
\
\
=> nil
irb(main):008:0> "abc".gsub /./, "\\\\n"
=> "\\n\\n\\n"
irb(main):009:0> puts("abc".gsub /./, "\\\\n")
\n\n\n
=> nil

I think these examples are somewhat misleading, because the escaped
newline (\n) normally includes a backslash. Taking that into account,
i.e. not counting the one that is part of newline character, the first
example is only using 2 backslashes, and the second example is using
3. The same goes for its friends, \a, \r, \f, etc.

It is safer to use 4 backslashes. This is the only robust way to do
this even though sometimes you can simply use a single backslash (e.g.
\1 instead of \\1) because string parsing is a bit tolerant under some
circumstances:

I don't think this is tolerance from the string parser, it is
recognition of the \1 as a valid octal value.

irb(main):014:0> '\1'
=> "\\1"
irb(main):015:0> '\\1'
=> "\\1"

Here the single quotes are coming into play. Octal escapes are not
recognized within them. But it outputs the string in double quotes,
"forcing" the backslash to be escaped in the output. Backslashes need
to be escaped in single quoted string, just like they do in double
quoted ones, so in the second example ('\\1'), it's just one
backslash, again.

but

irb(main):019:0> "\n"
=> "\n"
irb(main):020:0> "\\n"
=> "\\n"
irb(main):021:0> "\1"
=> "\x01"
irb(main):022:0> "\\1"
=> "\\1"

Here the double quotes are taking effect. The first correctly prints a
newline, the second an escaped one, the third gets recognized as an
octal escape, and the last escapes the meaning of the backslash that
would otherwise cause the 1 to be interpreted as an octal value.

Maybe using 4 backslashes is safer, overall, but I wouldn't make it a
rule. At least not without explaining these special cases that include
a leading backslash in their normal representation.

Regards,
Ammar

···

On Mon, Nov 22, 2010 at 3:53 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

Actually, 3 backslashes will yield one backslash. The first two result
in one (escaped), and the third one, escaped by the previous escaped
backslash ends up being one. My second example showed this, using 6
backslashes instead of 8. Using 4 backslashes works because the second
pair yields and escaped backslash, but it is not necessary.

That does not work reliably under all circumstances though:

irb(main):006:0> "abc".gsub /./, "\\\n"
=> "\\\n\\\n\\\n"
irb(main):007:0> puts("abc".gsub /./, "\\\n")
\
=> nil
irb(main):008:0> "abc".gsub /./, "\\\\n"
=> "\\n\\n\\n"
irb(main):009:0> puts("abc".gsub /./, "\\\\n")
\n\n\n
=> nil

I think these examples are somewhat misleading, because the escaped
newline (\n) normally includes a backslash. Taking that into account,
i.e. not counting the one that is part of newline character, the first
example is only using 2 backslashes, and the second example is using
3. The same goes for its friends, \a, \r, \f, etc.

That is the very point of my posting: you cannot always use three slashes reliably because - ooops - all of a sudden the last one may be part of something else. In other case, it happens to work

irb(main):002:0> "abc".gsub /./, "\\\y"
=> "\\y\\y\\y"
irb(main):003:0> "abc".gsub /./, "\\\\y"
=> "\\y\\y\\y"

Now if someone changes "y" to "n" in the first case the (probably unintended) effect is dramatic. Or consider a replacement string 'foo \1 bar' which at some point in time is changed to "foo \1 bar \n" unsuspectingly and which suddenly does not only yield a newline but some weird octal character. This would have been avoided if the original string did contain two backslashes already.

It is safer to use 4 backslashes. This is the only robust way to do
this even though sometimes you can simply use a single backslash (e.g.
\1 instead of \\1) because string parsing is a bit tolerant under some
circumstances:

I don't think this is tolerance from the string parser, it is
recognition of the \1 as a valid octal value.

irb(main):014:0> '\1'
=> "\\1"
irb(main):015:0> '\\1'
=> "\\1"

Here the single quotes are coming into play. Octal escapes are not
recognized within them. But it outputs the string in double quotes,
"forcing" the backslash to be escaped in the output. Backslashes need
to be escaped in single quoted string, just like they do in double
quoted ones, so in the second example ('\\1'), it's just one
backslash, again.

Apparently I was not clear enough. The point is, that there is some tolerance. Both sequences (line 14 and 15) produce the *same* output although they differ in backslash usage. This does not work if you try to write '\' to get a single backslash. For that you need '\\'. If you use two backslashes in both cases it's clear what happens and there is no room for errors.

but

irb(main):019:0> "\n"
=> "\n"
irb(main):020:0> "\\n"
=> "\\n"
irb(main):021:0> "\1"
=> "\x01"
irb(main):022:0> "\\1"
=> "\\1"

Here the double quotes are taking effect. The first correctly prints a
newline, the second an escaped one,

This is not an "escaped newline" but merely a backslash followed by character "n". Whether that is considered "escaped" in some way depends on the code that processes this string. If at all this is an escaped "n". :slight_smile:

the third gets recognized as an
octal escape, and the last escapes the meaning of the backslash that
would otherwise cause the 1 to be interpreted as an octal value.

Correct.

Maybe using 4 backslashes is safer, overall, but I wouldn't make it a
rule. At least not without explaining these special cases that include
a leading backslash in their normal representation.

My precise reason to make it a rule is that it is simple and beginners do not have to remember all these special cases that you find so worthy mentioning.

Actually I do not like those special cases and would rather suggest to remove them since they make things unnecessary complicated. The repeated occurrence of newbie confusion and the very discussion we are having here proves that the logic creates more confusion than clarity. The only reason I do not suggest to change this is the fact that this might break a lot of code.

Kind regards

  robert

···

On 22.11.2010 18:21, Ammar Ali wrote:

On Mon, Nov 22, 2010 at 3:53 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali<ammarabuali@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I don't think this is tolerance from the string parser, it is
recognition of the \1 as a valid octal value.

irb(main):014:0> '\1'
=> "\\1"
irb(main):015:0> '\\1'
=> "\\1"

----8<----

Apparently I was not clear enough. The point is, that there is some
tolerance. Both sequences (line 14 and 15) produce the *same* output
although they differ in backslash usage. This does not work if you try to
write '\' to get a single backslash. For that you need '\\'. If you use
two backslashes in both cases it's clear what happens and there is no room
for errors.

I guess I took issue with the word tolerance. I don't think of lexers
and parsers as tolerant. They are quite ruthless and dictatorial. It's
either their way, or their way in a way one did not expect. :slight_smile:

This is not an "escaped newline" but merely a backslash followed by
character "n". Whether that is considered "escaped" in some way depends on
the code that processes this string. If at all this is an escaped "n". :slight_smile:

You are correct sir. For someone who was nitpicking, I misspoke. :slight_smile:

My precise reason to make it a rule is that it is simple and beginners do
not have to remember all these special cases that you find so worthy
mentioning.

This might be six of one, half a dozen of the other kind of situation.
People would start to ask if the backslash in the \n case would count
in the "just add 4" rule, or not? 4 backslashes in total or 5? It
seems to only shift the issue slightly, and temporarily, until one has
to actually understand what is really going on.

Actually I do not like those special cases and would rather suggest to
remove them since they make things unnecessary complicated. The repeated
occurrence of newbie confusion and the very discussion we are having here
proves that the logic creates more confusion than clarity. The only reason I
do not suggest to change this is the fact that this might break a lot of
code.

I agree, but this long "heritage" that goes back to the 60s is
probably very hard to shake. Maybe a new language can break away from
it.

Out of curiosity, what could these beasts be replaced with? Constants?

Cheers,
Ammar

···

On Mon, Nov 22, 2010 at 9:25 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

On 22.11.2010 18:21, Ammar Ali wrote:

I don't think this is tolerance from the string parser, it is
recognition of the \1 as a valid octal value.

irb(main):014:0> '\1'
=> "\\1"
irb(main):015:0> '\\1'
=> "\\1"

----8<----

Apparently I was not clear enough. The point is, that there is some
tolerance. Both sequences (line 14 and 15) produce the *same* output
although they differ in backslash usage. This does not work if you try to
write '\' to get a single backslash. For that you need '\\'. If you use
two backslashes in both cases it's clear what happens and there is no room
for errors.

I guess I took issue with the word tolerance. I don't think of lexers
and parsers as tolerant. They are quite ruthless and dictatorial. It's
either their way, or their way in a way one did not expect. :slight_smile:

:slight_smile: But rules can be made to allow for some flexibility (just think
of method calls with or without brackets in Ruby).

This is not an "escaped newline" but merely a backslash followed by
character "n". Whether that is considered "escaped" in some way depends on
the code that processes this string. If at all this is an escaped "n". :slight_smile:

You are correct sir. For someone who was nitpicking, I misspoke. :slight_smile:

No problem. Apparently we both enjoy nitpicking. :-))

My precise reason to make it a rule is that it is simple and beginners do
not have to remember all these special cases that you find so worthy
mentioning.

This might be six of one, half a dozen of the other kind of situation.
People would start to ask if the backslash in the \n case would count
in the "just add 4" rule, or not? 4 backslashes in total or 5? It
seems to only shift the issue slightly, and temporarily, until one has
to actually understand what is really going on.

Hmm... Maybe.

Actually I do not like those special cases and would rather suggest to
remove them since they make things unnecessary complicated. The repeated
occurrence of newbie confusion and the very discussion we are having here
proves that the logic creates more confusion than clarity. The only reason I
do not suggest to change this is the fact that this might break a lot of
code.

I agree, but this long "heritage" that goes back to the 60s is
probably very hard to shake. Maybe a new language can break away from
it.

In Ruby's case the heritage does not go back to the sixties but rather
to the nineties (1997) if I am not mistaken.

Out of curiosity, what could these beasts be replaced with? Constants?

I'd leave everything as is except drop special cases like '\1' (this
would either be an octal escape as in a double quoted string or rather
just "1"). In single quoted strings only ' would be special if
preceded by a backslash. In double quoted strings I would have those
characters which are special currently (", n, r, a, t and probably
others I'm not thinking of right now). I am undecided whether I would
make all others errors or tolerant (e.g. "\z" would either by a syntax
error or just "z"). I have a slight tendency to the more strict
variant though because otherwise people might be left wondering what
\z means when it is just "z"; also, this would help detect typing
errors (maybe someone wanted to type "\t" which is just a key away in
my German keyboard).

Kind regards

robert

···

On Mon, Nov 22, 2010 at 10:06 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Mon, Nov 22, 2010 at 9:25 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

On 22.11.2010 18:21, Ammar Ali wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

>
> I guess I took issue with the word tolerance. I don't think of lexers
> and parsers as tolerant. They are quite ruthless and dictatorial. It's
> either their way, or their way in a way one did not expect. :slight_smile:

:slight_smile: But rules can be made to allow for some flexibility (just think
of method calls with or without brackets in Ruby).

That's a good example, and I know understand what you meant by tolerance.

>> This is not an "escaped newline" but merely a backslash followed by
>> character "n". Whether that is considered "escaped" in some way depends
on
>> the code that processes this string. If at all this is an escaped "n".
:slight_smile:
>
> You are correct sir. For someone who was nitpicking, I misspoke. :slight_smile:

No problem. Apparently we both enjoy nitpicking. :-))

:slight_smile:

I agree, but this long "heritage" that goes back to the 60s is
> probably very hard to shake. Maybe a new language can break away from
> it.

In Ruby's case the heritage does not go back to the sixties but rather
to the nineties (1997) if I am not mistaken.

I was thinking of C, which I believe introduced these escapes, but I'm not
sure.

> Out of curiosity, what could these beasts be replaced with? Constants?

I'd leave everything as is except drop special cases like '\1' (this
would either be an octal escape as in a double quoted string or rather
just "1"). In single quoted strings only ' would be special if
preceded by a backslash. In double quoted strings I would have those
characters which are special currently (", n, r, a, t and probably
others I'm not thinking of right now). I am undecided whether I would
make all others errors or tolerant (e.g. "\z" would either by a syntax
error or just "z"). I have a slight tendency to the more strict
variant though because otherwise people might be left wondering what
\z means when it is just "z"; also, this would help detect typing
errors (maybe someone wanted to type "\t" which is just a key away in
my German keyboard).

I like the idea of treating unnecessary escapes as syntax errors, or at
least warnings. I see this a lot in regular expressions, especially in
character sets. Characters that don't need to be escaped (like ? and *) are
preceded with a backslash, just to be safe I guess, making for a harder to
code, as you noted.

Regards,
Ammar

···

On Tue, Nov 23, 2010 at 11:17 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Mon, Nov 22, 2010 at 10:06 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

I agree, but this long "heritage" that goes back to the 60s is
> probably very hard to shake. Maybe a new language can break away from
> it.

In Ruby's case the heritage does not go back to the sixties but rather
to the nineties (1997) if I am not mistaken.

I was thinking of C, which I believe introduced these escapes, but I'm not
sure.

Yeah, but I don't want to change \n, \t etc. in double quoted strings.
I mostly want to get rid of '\1' which is something completely
specific to Ruby.

> Out of curiosity, what could these beasts be replaced with? Constants?

I'd leave everything as is except drop special cases like '\1' (this
would either be an octal escape as in a double quoted string or rather
just "1"). In single quoted strings only ' would be special if
preceded by a backslash. In double quoted strings I would have those
characters which are special currently (", n, r, a, t and probably
others I'm not thinking of right now). I am undecided whether I would
make all others errors or tolerant (e.g. "\z" would either by a syntax
error or just "z"). I have a slight tendency to the more strict
variant though because otherwise people might be left wondering what
\z means when it is just "z"; also, this would help detect typing
errors (maybe someone wanted to type "\t" which is just a key away in
my German keyboard).

I like the idea of treating unnecessary escapes as syntax errors, or at
least warnings. I see this a lot in regular expressions, especially in
character sets. Characters that don't need to be escaped (like ? and *) are
preceded with a backslash, just to be safe I guess, making for a harder to
code, as you noted.

Exactly. I would not want to get rid of optional brackets for example
because lack of brackets can make code much more readable (apart from
foo.bar=(123) looking weird). It's always a question of balance. I
have to say that Matz did a remarkable job at this in Ruby in general.
This is just one of very few things that could be better (class
variables is another one I can think of right now).

Kind regards

robert

···

On Tue, Nov 23, 2010 at 12:39 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

On Tue, Nov 23, 2010 at 11:17 AM, Robert Klemme > <shortcutter@googlemail.com>wrote:

On Mon, Nov 22, 2010 at 10:06 PM, Ammar Ali <ammarabuali@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/