I'm reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in the
spefication. I'm loading it and trying to convert using Iconv, but I'm
getting a invalid character exception, on any string. Now I'm just
stripping the \000 character from it and it works, but I know it's not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I'll load these string in GTK (with Ruby bindings), anyone knows if
it can show Unicode strings?
···
--
Posted via http://www.ruby-forum.com/.
Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better with no BOM.
The first thing you should check for though is the presence of the BOM and read the BOM.
···
On May 5, 2007, at 9:01 AM, Alexandre Rosenfeld wrote:
I'm reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in the
spefication. I'm loading it and trying to convert using Iconv, but I'm
getting a invalid character exception, on any string. Now I'm just
stripping the \000 character from it and it works, but I know it's not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I'll load these string in GTK (with Ruby bindings), anyone knows if
it can show Unicode strings?
--
Posted via http://www.ruby-forum.com/\.
Hi,
At Sun, 6 May 2007 23:05:37 +0900,
Alexandre Rosenfeld wrote in [ruby-talk:250503]:
What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?
BOM is a "ZERO WIDTH NON-BREAKING SPACE" at the beginning of
a text. Almost iconv(3) should be possible to deal with it.
Can't you show minimal data to reproduce the error?
···
--
Nobu Nakada