Hi,
How can I convert utf-8 encoded strings to latin-2?
I have tried it using libuconv with little success:
require ‘uconv’
class String
def un_utf8
Uconv.u8tou16(self).gsub(/\000/, ‘’)
end
def to_utf8
tmp = ""
self.each_byte { |b|
tmp += b.chr + "\000"
}
Uconv.u16tou8(tmp)
end
end
This program is ugly, and does not exactly what I want.
u8tou16 generates a string with 16 bit long characters,
for example “test”.un_utf8 == “t\000e\000s\000t\000”.
gsub clears the unnecessery “\000” characters from
the string. But there are characters in Hungarian,
that has non-zero second byte in the output of the
u8tou16, so they fail to convert. Anyway this is an
ugly hack.
How is it done nicely?
···
–
bSanyI
I think the iconv module handles this nicely:
require ‘iconv’
Iconv.conv(“utf-8”,“latin2”,“this is a test”)
Wesley J. Landaker - wjl@icecavern.net
OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2
···
On Friday 14 November 2003 7:46 am, Bedo Sandor wrote:
Hi,
How can I convert utf-8 encoded strings to latin-2?
I have tried it using libuconv with little success:
require ‘uconv’
class String
def un_utf8
Uconv.u8tou16(self).gsub(/\000/, ‘’)
end
def to_utf8
tmp = ""
self.each_byte { |b|
tmp += b.chr + "\000"
}
Uconv.u16tou8(tmp)
end
end
This program is ugly, and does not exactly what I want.
u8tou16 generates a string with 16 bit long characters,
for example “test”.un_utf8 == “t\000e\000s\000t\000”.
gsub clears the unnecessery “\000” characters from
the string. But there are characters in Hungarian,
that has non-zero second byte in the output of the
u8tou16, so they fail to convert. Anyway this is an
ugly hack.
How is it done nicely?