True that it would be nice if the user-agent could alert that the
encoding is being guessed at and is actually unknown, but...
how many average users even know what an encoding is? Few. Many
That's not a problem. The message that the page is broken in some way
is communicated.
programmers don't even know about it. It's a situation caused by
That is the primary goal. The developer should see his page is broken.
If the developer does not know about encoding he is poorly prepared
for development in a world where various encodings are used. Any
decent web tutorial must at least mention encoding because it is
required by the standard. And we are talking about the web here.
history. In the past, systems were more expensive and less powerful.
Compromises were frequently made. (Y2K bug was a result of the same
problem.) Now there is a sense of unlimited potential for storage and
processing power, but there is still the legacy content and still
content being produced following many different standards.
Some of the problems with web development are also a result of the
first browser war between netscape and microsoft.
At one time there were dozens of ASCII variants out there as well,
hard-coded into machines. Eventually this stuff will all be replaced
I do not see how the browser war caused existence of the various
encodings that existed even before the web. Yes, they were mainly
caused by the decision to put the character into a single byte
initially which could not accommodate all languages at once.
by unicode. It has the momentum. The current problem is still partial
But it is not thing of the past. Documents in various encodings exist.
Unicode is good enough for most uses but new non-unicode encodings
exist for special purposes not served by unicode.
implementations and different implementations. Even at the OS level.
I don't know on windows, but much of this functionality is a service
in OS X, it is provided freely to the programmer by the system.
Still, there are programmers in OS X who just avoid the issue. Cocoa
makes it easy for those developers to support all the languages it
supports in unicode. TextMate, the premier Ruby/Rails editor on OS X,
is a great example of this. It doesn't support CJK properly. Odd,
because TextEdit does, because it recieves all of that support free
from the system.
New software should support unicode in the first place, and conversion
if need for different encodings is anticipated. CJK input is a
different issue, though. Sure most systems (X11, Windows, OS X)
support CJK and other complex input methods. But it only works if you
do not touch certain input paths. If TextMate implements something
like identifier completion it may well clash with CJK input which is
sort of completion as well. If the author used the input method
himself he would certainly notice. But many people do not.
Windows must surely have a similar offering to its developers.
Still there are lots of crusty developers out there who don't see an
issue with this.
Yes, many projects developed exclusively be English developers (or
people who only use languages that fit in the Latin1 range) completely
ignore internationalization issues. Or try to bolt them on once the
project becomes popular enough that the issue becomes apparent.
So we continue to get a lot of *accepted* problems with language
support.
Not necessarily accepted. But sure they exist and are slow to weed out.
The first scripting language to really start to push unicode will be
the winner in the future.
Not only unicode. You can get that with Python, and people still
complain. The ability to work in different encodings seamlessly is
needed.
Thanks
Michal
···
On 10/05/07, John Joyce <dangerwillrobinsondanger@gmail.com> wrote: