Conversion between utf-8 and iso8859-1?

Hi,

I need to convert between different character sets,
but didn’t find any library to do so except for
ruby-gnome’s glib.convert

Is there any character conversion library which
doesn’t come with a complete graphical library?

regards
Hadmut

“Hadmut Danisch” spamblock@danisch.de skrev i en meddelelse
news:c0j99t$t9p$04$1@news.t-online.com

Hi,

I need to convert between different character sets,
but didn’t find any library to do so except for
ruby-gnome’s glib.convert

Is there any character conversion library which
doesn’t come with a complete graphical library?

I’m not aware of any but that isn’t to say there isn’t one :slight_smile:
I assume you have checked out “iconv” which I have no experience with.

There is a good code page tutorial here - follow a few links if you need.

http://www.cs.tut.fi/~jkorpela/chars.html

The utf-8 format is easily decomposed into UCS2, and from there it is fairly
easy to go to 8859-1 because it is only 256 characters and most of them are
in the lower 8 bytes of UCS2.
You should btw. also consider 8859-9 (I think it is) it’s basically 8859-1
with the euro sign.

Mikkel

Hi,

I need to convert between different character sets,
but didn’t find any library to do so except for
ruby-gnome’s glib.convert

Is there any character conversion library which
doesn’t come with a complete graphical library?

Between these two encodings, you can use, without any external library:

utf8string.unpack(“U*”).pack(“c*”) # => latin1 string

latin1string.unpack(“C*”).pack(“U*”) # => utf8 string

I need to convert between different character sets,
but didn’t find any library to do so except for
ruby-gnome’s glib.convert

Is there any character conversion library which
doesn’t come with a complete graphical library?

I’m not aware of any but that isn’t to say there isn’t one :slight_smile:
I assume you have checked out “iconv” which I have no experience with.

iconv sounds like the tool to me.

You should btw. also consider 8859-9 (I think it is) it’s basically 8859-1
with the euro sign.

ISO-8859-15, which has updated french and German characters, and the
Euro. -9 is non-roman.

Ari

Aredridel wrote:

I need to convert between different character sets,
but didn’t find any library to do so except for
ruby-gnome’s glib.convert

iconv sounds like the tool to me.

Or, if all you’re doing is converting iso-8859-1 to UTF, you could use
String::unpack and Array.pack:

    # ASCII (ISO-8859-1) -> UTF:
    string.unpack("C*").pack("U*")
    # UTF-8 -> ISO-8859-1
    string.unpack("U*").pack("C*")

Of course, the conversion from UTF-8 to ISO-8859-1 won’t work in all cases,
because the character space of UTF-8 is larger than ISO-8859-1. Going from
ASCII to UTF-8 should always work, though.

— SER

Deutsch|Esperanto|Francaise|Linux|XML|Java|Ruby|Haskell|Aikido

http://www.germane-software.com/~ser jabber.com:ser ICQ:83578737

GPG: http://www.germane-software.com/~ser/Security/ser_public.gpg