Escaping single quotes in a string with gsub

Hi,

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.

This has a problem when there is an escaped slash before the quote:
\\'. I believe the fix should be to look two characters back. If
anyone has a canned solution I'm all ears. Would look-behind be an
option here out of the box?

While I was experimenting I saw some behavior I don't understand and
am hoping someone can explain it to me:

prubel@cornet /tmp> cat /tmp/t.rb ; ruby /tmp/t.rb
2.times do
  # replace not a slash followed by a quote with not a slash
  # and an escaped quote.
  puts("\\'Summer's Day".gsub(/([^\\\\])'/,"*#{$1}\\\\'*"))
  puts $1
end
#end
\'Summe*\'*s Day
r
\'Summe*r\'*s Day
r
prubel /tmp> ruby --version
ruby 1.8.1 (2004-02-06) [i686-linux-gnu]

I'm confused at to why the output is different for the two
iterations? Why doesn't the r get placed in the first output?

         thank you for your help,
             Paul

Hi,

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.

What about:

gsub(/(\\*)'/) { |m| $1.length % 2 == 0 ? $1 + "\\'" : m }

Would look-behind be an option here out of the box?

Surprisingly, I don't believe Ruby yet supports lookbehind.

While I was experimenting I saw some behavior I don't understand and
am hoping someone can explain it to me:

prubel@cornet /tmp> cat /tmp/t.rb ; ruby /tmp/t.rb
2.times do
  # replace not a slash followed by a quote with not a slash
  # and an escaped quote.
  puts("\\'Summer's Day".gsub(/([^\\\\])'/,"*#{$1}\\\\'*"))

The above line is problematic for two reasons. First, when using the replacement string version of gsub(), your string is interpolated before the method is even called let alone before any matches are made so $1 and friends are not set. Instead, try using a \1 in a single quoted string or \\1 in a double to get the value you're after.

Two, I don't understand your pattern. [^\\\\] means ONE character that is not a slash and also not a slash. It's identical to [^\\]. I think you meant to say, not two slashes, but that's a little harder to express in a regex. And what if there are three slashes? See my solution above for a different approach.

  puts $1
end
#end
\'Summe*\'*s Day
r
\'Summe*r\'*s Day
r
prubel /tmp> ruby --version
ruby 1.8.1 (2004-02-06) [i686-linux-gnu]

I'm confused at to why the output is different for the two
iterations? Why doesn't the r get placed in the first output?

Because $1 isn't set in time for the first replacement, but it is set when the second string is built (set by the first match).

Hope that helps.

James Edward Gray II

···

On Nov 3, 2004, at 12:12 PM, Paul Rubel wrote:

Hi --

Hi,

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.

This has a problem when there is an escaped slash before the quote:
\\'. I believe the fix should be to look two characters back. If
anyone has a canned solution I'm all ears. Would look-behind be an
option here out of the box?

I admit I get confused by escaping and stuff... but I can't quite
picture the case you're describing. If a string contains a single
quote:

  "abc'def"

that's the same as:

  "abc\'def"

So I don't think you'll actually see that backslash before the single
quote when you scan the string.

If you do see a slash -- i.e., if the string is:

  abc\'def

then that would probably be generated with "abc\\'def", which would be
equivalent to:

  "abc\\\'def"

I'm afraid I didn't quite follow the Summer's Day example. Can you
give another?

David

···

On Thu, 4 Nov 2004, Paul Rubel wrote:

--
David A. Black
dblack@wobblini.net

David,

David A. Black writes:
> Hi --
>

> I admit I get confused by escaping and stuff... but I can't quite
> picture the case you're describing. If a string contains a single
> quote:
>
> "abc'def"
>
> that's the same as:
>
> "abc\'def"
>
> So I don't think you'll actually see that backslash before the single
> quote when you scan the string.

Insightful. I suspect you're right and that I made things needlessly
complicated (at least at this point). When my code sees the string the
escaping should already have occurred.

>
> If you do see a slash -- i.e., if the string is:
>
> abc\'def
>
> then that would probably be generated with "abc\\'def", which would be
> equivalent to:
>
> "abc\\\'def"
>
> I'm afraid I didn't quite follow the Summer's Day example. Can you
> give another?

The context that I saw the problem was the following:

The code takes in a name and a value and then evals them. If the
var_value has an unescaped single quote it would give an error that
the string was malformed.

   to_eval = "#{var_name} = '#{var_value}'";
   eval(to_eval, binding)

Looking at it now I expect that a backslash in the var_value will
cause problems most of the time as the strings contents get
interpolated a second time during the eval. Is there a better way to
set a value in a binding? The implementation has the option to set
values in a hash rather than in the binding but I'd like to keep both
if possible.

        thank you,
          Paul

···

On Thu, 4 Nov 2004, Paul Rubel wrote:

James Edward Gray II writes:
>
> > Hi,
> >
> > I'm trying to take a string and escape a single quote if it is not
> > already escaped. My first thought was to look at the string and if I
> > see a quote without a backslash before it put the backslash there.
>
> What about:
>
> gsub(/(\\*)'/) { |m| $1.length % 2 == 0 ? $1 + "\\'" : m }

That does look much better.

> > While I was experimenting I saw some behavior I don't understand and
> > am hoping someone can explain it to me:
> >
> > prubel@cornet /tmp> cat /tmp/t.rb ; ruby /tmp/t.rb
> > 2.times do
> > # replace not a slash followed by a quote with not a slash
> > # and an escaped quote.
> > puts("\\'Summer's Day".gsub(/([^\\\\])'/,"*#{$1}\\\\'*"))
>
> The above line is problematic for two reasons. First, when using the
> replacement string version of gsub(), your string is interpolated
> before the method is even called let alone before any matches are made
> so $1 and friends are not set. Instead, try using a \1 in a single
> quoted string or \\1 in a double to get the value you're after.

I should have know. Thanks for the explanation.

> Two, I don't understand your pattern. [^\\\\] means ONE character that
> is not a slash and also not a slash. It's identical to [^\\]. I think
> you meant to say, not two slashes, but that's a little harder to
> express in a regex. And what if there are three slashes? See my
> solution above for a different approach.

I mean to say not a slash but the interpolation in the replacement
confused me. After reading your response and thinking a bit I believe
my head has been wrapped around the issue.

> Hope that helps.

Very much.
      thank you,
        Paul

···

On Nov 3, 2004, at 12:12 PM, Paul Rubel wrote:

James Edward Gray II wrote:

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.
Would look-behind be an option here out of the box?

Surprisingly, I don't believe Ruby yet supports lookbehind.

However it does support look-ahead which is enough in this case:

"foo bar don't \\'".gsub(/((?!\\).(?:\\{2})*)'/, "\\1\\\\'")
# result: foo bar don\'t \'

And since the escape string is only a single character:

"foo bar don't \\'".gsub(/([^\\](?:\\{2})*)'/, "\\1\\\\'")
# result: foo bar don\'t \'

(Note that this is basically your Regexp, but with some of the filtering logic moved from the block to the Regexp itself. The replacement string looks disgusting. I think your solution is way clearer.)

Here is a sample with a multiple-width escape string:

"foo bar don't ESC'".gsub(/((?!ESC).{3}(?:(?:ESC){2})*)'/, "\\1ESC'")
# result: foo bar donESC't ESC'