JSON.parse and unicode escape?

The documentation for the ruby JSON classes (http://json.rubyforge.org/)
implies that it handles unicode escaping fine. But I'm having trouble
with parsing JSON with a unicode escape sequence in it. I am using the
'ext' parser (JSON::Ext::Parser) not the 'pure' parser. version 1.1.2,
which appears to still be the latest.

Here is some test JSON, that's actually an excerpt from some JSON
returned to me by a third party web service. Finally boiled it down to
the simplest demonstration case. I saved it in a file, but here's what's
in the text file:

···

=====
{ "key": 'something \x26 more' }

I believe that is valid json, containing an escaped unicode char? But
JSON.parse on that string throws, complaining:

JSON::ParserError: unexpected token at '{ "summary": ' \u0026 ' }

I have verified it is the /x26 that's doing it. It doesn't like \x
escaped unicode.

Am I doing something wrong? Is the JSON I am receiving from the third
party bad somehow? This is such a widely used library that I'd be
surprised if it's broken and can't parse input including unicode escape
sequences... but that's what it looks like to me. Feedback?
--
Posted via http://www.ruby-forum.com/.

I am running into what seems to be a related problem with the
following code:

irb

require 'json'

=> true

JSON.parse('{"s":"\uddb0"}')

JSON::ParserError: source sequence is illegal/malformed near uddb0"}
  from /Library/Ruby/Gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in
`parse'
  from /Library/Ruby/Gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in
`parse'
  from (irb):2

I don't know enough about unicode to really understand what is being
escaped here, but the following unicode characters, very close in
range (I assume) do not throw an error:
"\ucdb0", "\uedb0", "\ud7b0"

I also validated the JSON string ('{"s":"\uddb0"}') successfully at
http://www.jsonlint.com/ and in Python.

Any ideas of what might be the problem?
Are there any alternative JSON parsers for ruby?

Thank you very much // pascal

···

from :0

That's not valid Unicode. See:

You can only have that code point in UTF-16

-Rob

···

On Oct 1, 2008, at 12:18 AM, pwever wrote:

I am running into what seems to be a related problem with the
following code:

irb

require 'json'

=> true

JSON.parse('{"s":"\uddb0"}')

JSON::ParserError: source sequence is illegal/malformed near uddb0"}
  from /Library/Ruby/Gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in
`parse'
  from /Library/Ruby/Gems/1.8/gems/json-1.1.3/lib/json/common.rb:122:in
`parse'
  from (irb):2
  from :0

I don't know enough about unicode to really understand what is being
escaped here, but the following unicode characters, very close in
range (I assume) do not throw an error:
"\ucdb0", "\uedb0", "\ud7b0"

I also validated the JSON string ('{"s":"\uddb0"}') successfully at
http://www.jsonlint.com/ and in Python.

Any ideas of what might be the problem?
Are there any alternative JSON parsers for ruby?

Thank you very much // pascal

That makes a lot of sense. Thanks for the clarification regarding the
unicode range.

Since I don't have control over the JSON source, I would like to try to
parse the JSON even if it results in a malformed unicode string. So
today I tried switching from 'json' to the 'ruby-json' library. After
some searching online, I didn't find any documentation on how to use it
though. Primarily I don't know how to include or require it.

require 'ruby-json'
require 'rubyjson'

don't seem to work?
Any ideas are appreaciated.
Thank you very much
// pascal

···

--
Posted via http://www.ruby-forum.com/.