Convert \uXXXX to character

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as 'привет!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
decimal and at the end convert each code to character with function:

str.scan(/[0-9]+/).each {|x| result_str << x.to_i}

but i don't think that it is the most rational way.

···

--
Posted via http://www.ruby-forum.com/\.

I recommend you:

···

2010/6/27 born in USSR <psixxx@bk.ru>:

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as 'привет!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
decimal and at the end convert each code to character with function:

str.scan(/[0-9]+/).each {|x| result_str << x.to_i}

but i don't think that it is the most rational way.
--
Posted via http://www.ruby-forum.com/\.

irb(main):001:0> RUBY_VERSION
=> "1.9.1"
irb(main):002:0> puts '\u041f\u0440\u0438\u0432\u0435\u0442!'
\u041f\u0440\u0438\u0432\u0435\u0442!
=> nil
irb(main):003:0> puts "\u041f\u0440\u0438\u0432\u0435\u0442!"
Привет!
=> nil

Note the difference in single quotes versus double quotes.

-Justin

···

On 06/27/2010 05:33 AM, born in USSR wrote:

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as 'привет!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
decimal and at the end convert each code to character with function:

str.scan(/[0-9]+/).each {|x| result_str<< x.to_i}
     

but i don't think that it is the most rational way.

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as 'привет!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
decimal and at the end convert each code to character with function:

If I understand you correctly you can leverage Ruby's parser to
interpret your string literal:

x = '\u041f\u0440\u0438\u0432\u0435\u0442!'

=> "\\u041f\\u0440\\u0438\\u0432\\u0435\\u0442!"

eval("\"#{x}\"")

=> "Привет!"

Be careful though with eval, make sure your string to be evaluated doesn't contain any untrusted code.

Gary Wright

···

On Jun 27, 2010, at 8:33 AM, born in USSR wrote:

I think the JSON parser is able to decode this unicode escapes
correctly!

The JSON parser will not decode an pure string to you have to wrap the
string into array syntax, and extract after parsing:

mbj@mbj ~ $ irb
irb(main):001:0> require 'json'
=> true
irb(main):002:0> x = '\u041f\u0440\u0438\u0432\u0435\u0442!'
=> "\\u041f\\u0440\\u0438\\u0432\\u0435\\u0442!"
irb(main):003:0> JSON.parse('["'+x+'"]')[0]
=> "Привет!"
irb(main):004:0>

IMHO better than eval :wink:

···

On Mon, Jun 28, 2010 at 01:22:33PM +0900, Gary Wright wrote:

On Jun 27, 2010, at 8:33 AM, born in USSR wrote:

> I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
> convert it to string such as 'привет!'.
> I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
> decimal and at the end convert each code to character with function:

If I understand you correctly you can leverage Ruby's parser to
interpret your string literal:

> x = '\u041f\u0440\u0438\u0432\u0435\u0442!'
=> "\\u041f\\u0440\\u0438\\u0432\\u0435\\u0442!"
> eval("\"#{x}\"")
=> "Привет!"

Be careful though with eval, make sure your string to be evaluated doesn't contain any untrusted code.

Gary Wright

str = '\u041f\u0440\u0438\u0432\u0435\u0442!'
p str.gsub(/\\u(\h{4})/) {
  $1.to_i(16).chr('UTF-8')
}

What do you say of this?
Well, I was searching something in the line of String#unpack, like

p str.gsub(/\\u(\h{4})/) {
  [$1.to_i(16)].pack('U')
}

but as we are scanning one by one, it is not interesting and need an
extra array like in JSON (but it is 1.8 compatible).

B.D.

···

On 28 June 2010 07:39, Markus Schirp <mbj@seonic.net> wrote:

IMHO better than eval :wink:

Don’t forget that Unicode Code Points not only cover the BMP and can be up to 6 hex digits long
[Unicode - Wikipedia].

What do you do if the string contained some escaped backslashes, like in str = '\u041f\\u0440'? Does it contain Surrogates?

– Matthias

···

On 28.06.2010 15:24, Benoit Daloze wrote:

On 28 June 2010 07:39, Markus Schirp <mbj@seonic.net> wrote:

IMHO better than eval :wink:

str = '\u041f\u0440\u0438\u0432\u0435\u0442!'
p str.gsub(/\\u(\h{4})/) {
  $1.to_i(16).chr('UTF-8')
}