Individual char values in a Unicode string

Tim_Bray · 2 September 2006 04:45

I'm trying to figure out how to use [] String or jconv or something to figure out the actual code-point values in a Unicode/UTF-8 string. For example, how can I write f such that

f('tö中') ==> [ 0x74, 0xf6, 0x4e2d ]

(hex just for clarity of course, I want numbers).

-Tim

Paul_Lutus · 2 September 2006 05:10

Tim Bray wrote:

I'm trying to figure out how to use String or jconv or something
to figure out the actual code-point values in a Unicode/UTF-8
string. For example, how can I write f such that

f('tö中') ==> [ 0x74, 0xf6, 0x4e2d ]

(hex just for clarity of course, I want numbers).

Hex numbers are numbers.

To answer your question, you can extract bytes from a string:

#!/usr/bin/ruby

s = "this is a test"

i = 0
while (i < s.size)
puts s[i] # emits numbers, not characters
i += 1
end

Bu I don't think Ruby recognizes characters, Unicode or otherwise. So it may
not be able to interpret a mixture of Unicode and UTF/8 without explicit
code from the programmer.

···

--
Paul Lutus
http://www.arachnoid.com

Daniel_Berger2 · 2 September 2006 05:39

Tim Bray wrote:

I'm trying to figure out how to use String or jconv or something to figure out the actual code-point values in a Unicode/UTF-8 string. For example, how can I write f such that

f('tö中') ==> [ 0x74, 0xf6, 0x4e2d ]

(hex just for clarity of course, I want numbers).

-Tim

'tö中'.unpack("U*") => [116, 246, 20013]

Regards,

Dan

Topic		Replies	Views
String from code points? ruby-talk	3	129	27 May 2009
A Code Point's Tale: There and Back Again ruby-talk	5	177	1 May 2011
Convert \uXXXX to character ruby-talk	6	110	28 June 2010
Accessing each character of a string ruby-talk	1	115	19 May 2005
Using unpack on a UTF-8 string ruby-talk	3	118	25 February 2007

Individual char values in a Unicode string

Related topics