Hello,
I searched internet and the ml and realised this question was asked many
times, but i never found a satisfying answer for me.
I have an application that will read data (web pages) in different encodings
(at least latin1, latin2, possibly cp1250, potentially i would like to
support all possible encodings letting the user give me a string… maybe i’m
dreaming).
For now I only support latin2. but when i’ll add utf-8 and other encodings,
i’ll then mix all those encodings in a file…
i would like to convert these strings to unicode and from then on only to
deal with unicode.
From what I understand:
a/ the character set conversion is not supported from ruby proper. there are
libraries (where???). i found uconv, but it seems more utf8-japanese
encodings than latin1/2. i didn’t find anything else in
http://raa.ruby-lang.org/cat.rhtml?category_major=Library;category_minor=I18N
and
http://raa.ruby-lang.org/cat.rhtml?category_major=Library;category_minor=Text
maybe http://raa.ruby-lang.org/list.rhtml?name=codeconv but i see only binary,
euc (what’s that?), sjis, utf8 support.
i don’t think EUC is latin1/2, but I may be wrong.
for this conversion I could use “recode” in command-line although it’s not
very elegant :O(
b/ anyway it’s a moot point, because ruby doesn’t handle unicode strings… at
least it’s what I understood. am I wrong? Well, i read it should work, but
then from what I understood capitalize, size etc don’t work. once more, if I
understood correctly, there are libaries for this (one is 0.1, one is 0.2
"may work").
I’m also worried after seeing on a xterm -u8:
[emmanuel@emmanuels output]$ irb
irb(main):001:0> “a”.size
=> 1
irb(main):002:0> “č”.size
=> 2
irb(main):003:0>
maybe you don’t see the second character, it’s latin2, and it’s just one
character…
so, is it so bad as i seems, doctor?
emmanuel
PS: thank you for reading this long mail, and as I said, I’m aware this was
asked many times, only I didn’t find anything clear yet…
···
–
“If there is any kind of God, it’s not in you or in me,
but in the space between us”
– Celine, “Before Sunrise”
(It’s not about what you do, it’s about what you give.)