Thu, 1 Aug 2002 19:53:07 +0900, Curt Sampson cjs@cynic.net pisze:
Not just political reasons, but practical reasons. Unicode is designed
to work if you restrict yourself to using only 16-bit chars, and I
expect most programs are going to limit themselves to that. So even if
it were folded in to the extension space, most people wouldn’t use it.
Unicode is not 16-bit. The code point range is 0…0x10FFFF.
We don’t need to do the stupid thing that Java and MS Windows do,
i.e. using UTF-16 internally which combines disdvantages of UTF-8
and UTF-32: variable length and incompatible with ASCII.
The most straightforward internal representation is UTF-32: each
character is stored in 4 bytes. All other encodings are either
variable-length or can’t represent all Unicode characters.
If you need compactness or ASCII compatibility, use UTF-8. Most
characters (i.e. below U+FFFF) are encoded with 1, 2 or 3 bytes.
Good for data transmission and already widely known.
Anything else should be necessary only at the border with the outside
world.
···
–
__("< Marcin Kowalczyk
__/ qrczak@knm.org.pl
^^ Blog człowieka poczciwego.