Utf8 -> latin2

Bedo_Sandor1 · 14 November 2003 14:46

Hi,

How can I convert utf-8 encoded strings to latin-2?
I have tried it using libuconv with little success:

require ‘uconv’

class String
def un_utf8
Uconv.u8tou16(self).gsub(/\000/, ‘’)
end

    def to_utf8
            tmp = ""
            self.each_byte { |b|
                    tmp += b.chr + "\000"
            }
            Uconv.u16tou8(tmp)
    end

end

This program is ugly, and does not exactly what I want.
u8tou16 generates a string with 16 bit long characters,
for example “test”.un_utf8 == “t\000e\000s\000t\000”.
gsub clears the unnecessery “\000” characters from
the string. But there are characters in Hungarian,
that has non-zero second byte in the output of the
u8tou16, so they fail to convert. Anyway this is an
ugly hack.

How is it done nicely?

···

–
bSanyI

Wesley_J_Landaker · 14 November 2003 15:31

I think the iconv module handles this nicely:

require ‘iconv’
Iconv.conv(“utf-8”,“latin2”,“this is a test”)

Wesley J. Landaker - wjl@icecavern.net
OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2

···

On Friday 14 November 2003 7:46 am, Bedo Sandor wrote:

Hi,

How can I convert utf-8 encoded strings to latin-2?
I have tried it using libuconv with little success:

require ‘uconv’

class String
def un_utf8
Uconv.u8tou16(self).gsub(/\000/, ‘’)
end
    def to_utf8
            tmp = ""
            self.each_byte { |b|
                    tmp += b.chr + "\000"
            }
            Uconv.u16tou8(tmp)
    end
end

This program is ugly, and does not exactly what I want.
u8tou16 generates a string with 16 bit long characters,
for example “test”.un_utf8 == “t\000e\000s\000t\000”.
gsub clears the unnecessery “\000” characters from
the string. But there are characters in Hungarian,
that has non-zero second byte in the output of the
u8tou16, so they fail to convert. Anyway this is an
ugly hack.

How is it done nicely?

Topic		Replies	Views
Ucs-2 ruby-talk	4	125	1 August 2006
Converting to UCS-2 or UTF-16 for use by a C extension ruby-talk	0	105	7 June 2007
[ENCODING] UTF8 hell ruby-talk	14	706	24 February 2010
String to UTF ruby-talk	9	104	22 December 2003
Ruby 1.8.* convert string to utf-8 ruby-talk	7	222	20 August 2008

Utf8 -> latin2

Related topics