Stopping String Escaping

Phil_Cooper-king · 7 January 2010 20:34

Hi,

I'm trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \\ ruby teats it as a single \ is there
anyway to stop this? I still require all the other \'s such as \n etc.

Thanks
Phil.

···

--
Posted via http://www.ruby-forum.com/.

Brian_Candler · 7 January 2010 21:23

Phil Cooper-king wrote:

Hi,

I'm trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \\ ruby teats it as a single \ is there
anyway to stop this? I still require all the other \'s such as \n etc.

How are you parsing them?

If you are using File.read() then no unescaping is done.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html

If you are parsing them some other way, then please explain it. Please
also explain what "shop escaping" is.

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Phil_Cooper-king · 7 January 2010 22:03

How are you parsing them?
If you are using File.read() then no unescaping is done.

I am using rails and redcloth, I have the plain-text in the database,
and the text gets parsed when the view gets called atm.

I am using Uv to for the syntax, which I pull out before sending to
redcloth
[code]
def snatch_code(text)
snippets = text.scan(/#>code$(\S+)$(.+?)#>code/m)

snippets.each do |snip|
 code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
 code.insert(0, "<notextile>")
 code.insert(code.length, "</notextile>")
 text.sub!(/#>code$(\S+)$(.+?)#>code/m, code)
 end

text
end
[/code]
then redcloth parses it.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html

ouch. and thanks

If you are parsing them some other way, then please explain it. Please
also explain what "shop escaping" is.

dyslexia rules! KO!

I want to stop the escaping that’s not dealing with whitespace, tab, new
line etc.

Phil.

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 7 January 2010 22:26

Phil Cooper-king wrote:

I am using Uv to for the syntax, which I pull out before sending to
redcloth
[code]
def snatch_code(text)
 snippets = text.scan(/#>code$(\S+)$(.+?)#>code/m)

 snippets.each do |snip|
 code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
 code.insert(0, "<notextile>")
 code.insert(code.length, "</notextile>")
 text.sub!(/#>code$(\S+)$(.+?)#>code/m, code)
 end

 text
 end
[/code]

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<'EOS'
Put your sample source code here
EOS
# Print it to be sure it hasn't already been escaped by Ruby
# Now process it with Uv
# Show the intermediate state
# Now process it with Redcloth
# Show the final state

Then you can see whether the problem is with Uv, or with Redcloth.

Then the question becomes much more focussed - for example, it might be
"how do I stop Redcloth turning \\ into \ inside a <notextile> section?"

···

--
Posted via http://www.ruby-forum.com/\.

Phil_Cooper-king · 8 January 2010 09:01

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<'EOS'
Put your sample source code here
EOS

yeah I did this as well.

[code]
require 'rubygems'
require 'uv'

un_parsed =<<ENDOF
\\
ENDOF

parsed = Uv.parse(un_parsed, "xhtml", "c++", false, "twilight")
=> \

puts un_parsed
=> \
[/code]

in both cases the slash gets lost. I expect the \ to be lost in puts
tho. Using the dump I see the double slash is still there.

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 8 January 2010 09:57

Phil Cooper-king wrote:

un_parsed =<<ENDOF
\\
ENDOF

Unfortunately, here the \\ is being turned into a single backslash by
ruby, the same as inside a quoted string. In other words, the same as
this:

irb(main):001:0> "\\".size
=> 1
irb(main):002:0> '\\'.size
=> 1

The simplest way of preventing this is to read unparsed from a file, or
you can have an inline dataset at the end of your source code, like
this:

unparsed = DATA.read
... rest of your code goes here

__END__
\\

I expect the \ to be lost in puts
tho.

No, puts *never* converts two backslashes into one. If your string
contains two backslashes, puts will show two backslashes.

Using the dump I see the double slash is still there.

No, this is the opposite. String#inspect turns a raw string into a
quoted string for display purposes, and as part of this quoting a single
backslash is displayed as two backslashes.

Look at this:

irb(main):001:0> s = 92.chr
=> "\\"
irb(main):002:0> s.size
=> 1
irb(main):003:0> puts s
\
=> nil
irb(main):004:0> s2 = s + s
=> "\\\\"
irb(main):005:0> s2.size
=> 2
irb(main):006:0> puts s2
\\
=> nil

Hopefully it's clear from the above that string s has one character (a
single backslash), and s2 has two backslashes. But these are displayed
in quoted form in irb as

"\\"
"\\\\"

respectively. puts displays them correctly.

Similarly, a single newline character is displayed as backslash-n when
inspect gives the quoted form; whereas puts actually prints a newline.

irb(main):009:0> nl = 10.chr
=> "\n"
irb(main):010:0> nl.size
=> 1
irb(main):011:0> puts nl

=> nil

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

···

--
Posted via http://www.ruby-forum.com/\.

Phil_Cooper-king · 8 January 2010 10:20

Hopefully it's clear from the above

yes, thanks you.

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

[code]
require 'rubygems'
require 'redcloth'

data_read = DATA.read
string = "\\"

puts RedCloth.new(string).to_html
puts RedCloth.new(data_read).to_html

__END__
\\
[/code]

yeilds
\
\\

although I have no idea how to treat a string as a file.
is all this to do with encoding? (sorry if that was a dense question)

erb results are similar, which I would have though was be happening in
rails anyway

ERB.new("\\").src

=> "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 8 January 2010 10:40

Phil Cooper-king wrote:

Hopefully it's clear from the above

yes, thanks you.

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

[code]

...

data_read = DATA.read
string = "\\"

...

__END__
\\
[/code]

So in this program, 'data_read' contains two backslash characters; and
'string' contains a single backslash character.

yeilds
\
\\

That looks correct to me - HTML doesn't need a backslash to be escaped.
So now add Uv into your test to see if that is munging the backslashes.

although I have no idea how to treat a string as a file.

A string is just a string. In ruby 1.8 it's a sequence of bytes; in ruby
1.9 it's a sequence of characters. But that doesn't matter here; a
backslash is a backslash, and is both a single character and a single
byte in either ASCII or UTF-8.

However if you enter a string *literal* in a ruby program (or in IRB),
then it is parsed with backslash escaping rules to turn it into an
actual String object. For example:

a = "abc\ndef"
b = 'abc\ndef'

string 'a' contains 7 characters (a,b,c,newline,d,e,f), whereas string b
contains 8 characters (a,b,c,backslash,n,d,e,f). This is because there
are different escaping rules for double-quoted and single-quoted
strings.

In a single-quoted string literal, \' is a single quote, and \\ is a
backslash, and everything else is treated literally, so \n is two
characters \ and n.

In a double-quoted string literal, \" is a double quote, \n is a
newline, \\ is a backslash, and there's a whole load of other expansion
including #{...} for expression interpolation and #@... for instance
variable substitution.

erb results are similar, which I would have though was be happening in
rails anyway

ERB.new("\\").src

=> "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"

Now you're just scaring yourself with backslash escaping

Firstly, note that you passed a single backslash character to ERB.
That's what the string literal "\\" creates.

ERB compiled it to the following Ruby code:

_erbout = ''; _erbout.concat "\\"; _erbout

which just appends a single backslash to _erbout, which is what you
expect.

However, IRB displays the returned string from ERB.new using
String#inspect, so it is turned into a double-quoted string. This means:
1. A " is added to the start and end of the string
2. Any " within the string is displayed as \"
3. Any \ within the string is displayed as \\

In other words, String#inspect turns a string into a Ruby string literal
- something that you could paste directly into IRB. Try it:

str = "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"
puts str

That will show you the actual contents of str, which is the Ruby code I
pasted above.

HTH,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 8 January 2010 10:58

Here's the kind of standalone test I was thinking of.

----- 8< -------------------------------------------------
require 'rubygems'
require 'uv'
require 'redcloth'

snip = DATA.read
code = Uv.parse(snip, 'xhtml', 'ruby', false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
puts RedCloth.new(code).to_html

__END__
puts "Hello world!\n"
puts "Hello\\one backslash"
----- 8< -------------------------------------------------

And for me the output it gives is:

<pre class="twilight">puts "Hello world!\n"
puts "Hello\\one backslash"
</pre>

This looks correct to me. So can you provide an example where it fails?
Otherwise you need to look elsewhere in your application to see if
you're providing the wrong input into Uv, or you're handling the output
wrongly.

Or maybe you have an old gem with a bug which has since been fixed. I'm
using:

ultraviolet (0.10.2)
RedCloth (4.2.2)

···

--
Posted via http://www.ruby-forum.com/.

Phil_Cooper-king · 8 January 2010 12:06

thanks again

yep I have the same gems and the same result running your code.

I went nuts with the puts all over the place

fromdb: "##code(ruby)\r\n'\\\\'\r\n##code\r\n"

before: "##code(ruby)\n'\\\\'\n##code\n"

before parse: "\n'\\\\'\n"

after parse: "<pre class=\"twilight\">\n'\\\\'\n</pre>"

after insert: "<notextile><pre class=\"twilight\">\n'\\\\'\n</pre></notextile>"

after sub: "<notextile><pre class=\"twilight\">\n'\\'\n</pre></notextile>\n"

so after the sub section I loose two of the back slashes
[code]
text.sub!(/##code$(\S+)$(.+?)##code/m, code)
[/code]

···

--
Posted via http://www.ruby-forum.com/.

Brian_Candler · 8 January 2010 12:55

Phil Cooper-king wrote:

so after the sub section I loose two of the back slashes
[code]
text.sub!(/##code$(\S+)$(.+?)##code/m, code)
[/code]

Ah yes, backslashes have a special interpretation in the
string-replacement part of a (g)sub too: \1 means the first capture, \2
means the second capture etc, so \\ means a single backslash.

Note that the replacement string here is two backslashes:

puts "abc".sub(/b/, "\\\\")

a\c
=> nil

The easy solution is to use the block form of sub instead.

puts "abc".sub(/b/) { "\\\\" }

a\\c
=> nil

You could simplify your code if you rewrote to use the block form of
gsub anyway.

text.gsub!(/#>code$(\S+)$(.+?)#>code/m) do |snip|
... make a string containing the marked-up code
end

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 8 January 2010 13:04

You could simplify your code if you rewrote to use the block form of
gsub anyway.

Try this:

----- 8< -------------------------------------------------
require 'rubygems'
require 'uv'
require 'redcloth'

text = DATA.read
text.gsub!(/#>code$(\S+)$(.+?)#>code/m) do
 "<notextile>" +
 Uv.parse($2, 'xhtml', $1, false, 'twilight') +
 "</notextile>"
end
puts RedCloth.new(text).to_html

__END__
h1. Some code

#>code(ruby)
puts "Hello world!\n"
puts "Hello\\one backslash"
#>code

h1. The end
----- 8< -------------------------------------------------

Output:

<h1>Some code</h1>
<pre class="twilight">
puts "Hello
world!\n"
puts "Hello\\one backslash"
</pre><h1>The end</h1>

···

--
Posted via http://www.ruby-forum.com/\.

Phil_Cooper-king · 8 January 2010 13:07

I was just reading on them, well I wont forget this mistake quickly.

The easy solution is to use the block form of sub instead.

puts "abc".sub(/b/) { "\\\\" }

a\\c

yep worked like a gem

You could simplify your code if you rewrote to use the block form of
gsub anyway.

I'm having to loop through the code blocks in order to parse the syntax
with Uv anyway. though that while I was in the loop I may as replace
each code block as its parsed.

thanks for your effort, you've been a great help.

···

--
Posted via http://www.ruby-forum.com/\.

Phil_Cooper-king · 8 January 2010 13:20

Try this:

require 'rubygems'
require 'uv'
require 'redcloth'

text = DATA.read
text.gsub!(/#>code$(\S+)$(.+?)#>code/m) do
 "<notextile>" +
 Uv.parse($2, 'xhtml', $1, false, 'twilight') +
 "</notextile>"
end
puts RedCloth.new(text).to_html

__END__
h1. Some code

#>code(ruby)
puts "Hello world!\n"
puts "Hello\\one backslash"
#>code

h1. The end

thanks again, it worked like a treat, in 1/2 the lines

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
RedCloth issues ruby-talk	8	73	25 November 2006
Gsub and backslashes ruby-talk	15	1646	23 November 2010
Proposal - delayed intropolation in heredoc ruby-talk	8	120	31 March 2004
#{} and \" don't like each other ruby-talk	30	150	17 September 2003
RedCloth/Textile question ruby-talk	6	66	11 June 2004

Stopping String Escaping

Related topics