[ruby-talk:444454] codepoints

Information · 21 April 2024 09:01

Hi,

How to reverse str.codepoints?

how can I convert codepoints between UTF8 and UTF16 ?
e.g: cp1 = [0xf0, 0x9f, 0x8f, 0xb3]; cp2 = [0xe2, 0x82, 0xac] #=> 0x20ac
Could someone add UTF-8 to pack/unpack ?
# seems output is UTF16 instead as UTF8 as doku says!

How to show the US-Flag (star-flag as one graphical symbol), composed
of [0x1f1fa, 0x1f1f8] ?
# How to compose UTF-symbols consisting of more codepoints?

···

______________________________________________
ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org
To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org
ruby-talk info -- Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

phluid61 · 22 April 2024 08:05

Hi,

I'm not sure if you've checked, but there is plenty of online
documentation for Ruby.

Hi,

How to reverse str.codepoints?

str.codepoints.reverse

   how can I convert codepoints between UTF8 and UTF16 ?
   e.g: cp1 = [0xf0, 0x9f, 0x8f, 0xb3]; cp2 = [0xe2, 0x82, 0xac] #=> 0x20ac
   Could someone add UTF-8 to pack/unpack ?
         # seems output is UTF16 instead as UTF8 as doku says!

Those are arrays of bytes, not codepoints. A codepoint is usually
transported as a single integer. The most logical way to deal with
characters and codepoints is with strings, so I'd start by getting it
back from an array of bytes to a string with the correct encoding
metadata:

str1 = cp1.pack('C*').force_encoding('UTF-8') #=> ""
str2 = cp2.pack('C*').force_encoding('UTF-8') #=> "€"

Then you can convert to UTF-16

str1.encode('UTF-16').codepoints #=> [0xFEFF, 0x1F3F3]
str2.encode('UTF-16').codepoints #=> [0xFEFF, 0x20AC]

My question is: why? What are you trying to do? You seem to be partway
down a rabbit hole and have maybe lost track of the actual goal you're
trying to achieve?

How to show the US-Flag (star-flag as one graphical symbol), composed
of [0x1f1fa, 0x1f1f8] ?
# How to compose UTF-symbols consisting of more codepoints?

Emoji sequences (and other character sequences) are sequences of
codepoints, so you have to transmit them as a sequence. I don't know
what you're asking.

[0x1f1fa, 0x1f1f8].pack 'U*' #=> ""

Cheers

···

On Sun, 21 Apr 2024 at 19:02, Information via ruby-talk <ruby-talk@ml.ruby-lang.org> wrote:
--
Matthew Kerwin [he/him]
https://matthew.kerwin.net.au/
______________________________________________
ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org
To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org
ruby-talk info -- Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

Topic		Replies	Views
[ruby-talk:444453] codepoints ruby-talk	0	40	19 April 2024
A Code Point's Tale: There and Back Again ruby-talk	5	133	1 May 2011
ascii-text ruby-talk	6	389	9 July 2020
String from code points? ruby-talk	3	94	27 May 2009
UTF-8 question ruby-talk	20	119	15 August 2003

[ruby-talk:444454] codepoints

Related Topics