Utf8 -> latin2

Hi,

How can I convert utf-8 encoded strings to latin-2?
I have tried it using libuconv with little success:

require ‘uconv’

class String
def un_utf8
Uconv.u8tou16(self).gsub(/\000/, ‘’)
end

    def to_utf8
            tmp = ""
            self.each_byte { |b|
                    tmp += b.chr + "\000"
            }
            Uconv.u16tou8(tmp)
    end

end

This program is ugly, and does not exactly what I want.
u8tou16 generates a string with 16 bit long characters,
for example “test”.un_utf8 == “t\000e\000s\000t\000”.
gsub clears the unnecessery “\000” characters from
the string. But there are characters in Hungarian,
that has non-zero second byte in the output of the
u8tou16, so they fail to convert. Anyway this is an
ugly hack.

How is it done nicely?

···


bSanyI

I think the iconv module handles this nicely:

require ‘iconv’
Iconv.conv(“utf-8”,“latin2”,“this is a test”)


Wesley J. Landaker - wjl@icecavern.net
OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2

···

On Friday 14 November 2003 7:46 am, Bedo Sandor wrote:

Hi,

How can I convert utf-8 encoded strings to latin-2?
I have tried it using libuconv with little success:

require ‘uconv’

class String
def un_utf8
Uconv.u8tou16(self).gsub(/\000/, ‘’)
end

    def to_utf8
            tmp = ""
            self.each_byte { |b|
                    tmp += b.chr + "\000"
            }
            Uconv.u16tou8(tmp)
    end

end

This program is ugly, and does not exactly what I want.
u8tou16 generates a string with 16 bit long characters,
for example “test”.un_utf8 == “t\000e\000s\000t\000”.
gsub clears the unnecessery “\000” characters from
the string. But there are characters in Hungarian,
that has non-zero second byte in the output of the
u8tou16, so they fail to convert. Anyway this is an
ugly hack.

How is it done nicely?