A Code Point's Tale: There and Back Again

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

···

--
Posted via http://www.ruby-forum.com/.

I hope this is what u r looking for
http://ruby-unicode.rubyforge.org/doc/

···

--
Posted via http://www.ruby-forum.com/.

Hi,

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

I think you can use Array#pack for that:

$ irb
ruby-1.9.2-p180 :001 > "f뀀oöbß".each_codepoint.to_a
=> [102, 45056, 111, 246, 98, 223]
ruby-1.9.2-p180 :002 > "f뀀oöbß".each_codepoint.to_a.pack("U*")
=> "f뀀oöbß"

cheers

···

On 30.04.2011 06:12, Terry Michaels wrote:

Terry Michaels wrote in post #995906:

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

#encoding: UTF-8
#That comment tells ruby to treat string literals in my source code,
like
#the one below, as utf-8 encoded.

str = "\xE2\x82\xAC\xE2\x82\xAC"

codes = str.each_codepoint.to_a

p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join(" ")

--output:--
[8364, 8364]
€ €

(You should see two euro symbols as the last line of output.)

I don't know where you are getting your string, but you can always do
this:

str = "\xE2\x82\xAC\xE2\x82\xAC"
str.force_encoding("UTF-8")

codes = str.each_codepoint.to_a

p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join(" ")

--output:--
[8364, 8364]
€ €

(You should see two euro symbols as the last line of output.)

···

--
Posted via http://www.ruby-forum.com/\.

Maybe each_char() will work for you? Take a look at the following code.

str = "\xE2\x82\xAC\xE2\x82\xAC"
puts str.encoding

str.force_encoding("UTF-8")
puts str.encoding

chars = str.each_char.to_a
p chars

puts chars[0].encoding

puts chars.join

--output:--
ASCII-8BIT
UTF-8
["\u20AC", "\u20AC"]
UTF-8
€€

(You should see two euro symbols as the last line of output.)

The output implies that a string with unicode escapes is given a UTF-8
encoding by default. And that seems to be the case:

str = "\u20AC\u20AC"
puts str.encoding

--output:--
UTF-8

···

--
Posted via http://www.ruby-forum.com/.

7stud -- wrote in post #996022:

Terry Michaels wrote in post #995906:

This is probably obvious in the docs and I'm just missing it,

You will never learn ruby unicode by reading the docs. Head over to
James Edward Gray II's website for some lessons:

Gray Soft / Not Found

Someone else blogged in great detail about all the intricacies of ruby
unicode and its problems, but I can't find the link now.

···

--
Posted via http://www.ruby-forum.com/\.