Hi,
I think you have it. Unicode is formally a 31 bit character set but the
first 16 bits are called the “basic multilingual plane” (BMP) and were
supposed to be adequate for representing most natural languages. Some of the
characters in the BMP are combining or ‘dead key’ characters.
I think “Unicode” here should be replaced by “ISO 10646”.
What I have learned here is that either some or all of the Japanese think
that it doesn’t work for their language. It’s not clear whether they think
that more bytes are required or just that the existing standard doesn’t
assign them correctly.
Historically, Japanese people sustained original ISO 10646 32 bit, but
Unicode people brought Han Unification to limit in the 16 bit space,
and ISO 10646 adopted it. That made Japanese (and probably other
Asian people) disappointed. It requires table lookup for every code
conversion.
But practically there’s no problem to handle Japanese with Unicode.
It covers almost all Japanese characters used daily. It already used
as canonical encoding in major applications (for example, Microsoft
Word) without any problems.
The feeling behind “noisy” Japanese people around Curt is probably
caused by the “defeat” at the ISO10646 disussion.
I personally don’t have any negative feeling toward Unicode, but I
don’t trust it wholeheartedly. I don’t use Unicode in my daily life.
Most of my Japanese text are in EUC-JP or ISO-2022-JP. I don’t want
to be forced to convert them back and force from Unicode, if I don’t
care about any non Japanese text. When I want to treat Japanese,
Korean, and Chinese at the same time, I’d happily use Unicode. When I
want to handle characters not in Unicode, I’d use more bigger charset
like Mojikyo. I want to provide the freedom of choice. Unicode
centric I18N is simpler, but choices are ASCII or Unicode. Ruby I18N
will provide the way to handle user-definable encodings. You will be
able to choose ASCII, EUC-JP, Shift_JIS, EUC-KR, ISO-8859-1, or
Unicode. It will be more complex, but complex part will be hide
behind the framework.
matz.
···
In message “Re: Deprecation and Unicode” on 02/08/04, Chris Gehlker gehlker@fastq.com writes: