[OT] A question for people with English OS

Browsers aren't supposed to guess. That IE guesses simply means that IE
has yet another bug, born out of Microsoft's typical arrogant refusal to
follow standards.

Not that I'm a Microsoft apologist in general, but this seems to fall
under the philosophy of "be strict in what you output, but be tolerant
in what you input."

Servers that do not identify the correct encoding are bugged, too.

Which this does not.

···

On 5/9/07, John W. Kennedy <jwkenne@attglobal.net> wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

True that it would be nice if the user-agent could alert that the encoding is being guessed at and is actually unknown, but...
how many average users even know what an encoding is? Few. Many programmers don't even know about it. It's a situation caused by history. In the past, systems were more expensive and less powerful. Compromises were frequently made. (Y2K bug was a result of the same problem.) Now there is a sense of unlimited potential for storage and processing power, but there is still the legacy content and still content being produced following many different standards.
Some of the problems with web development are also a result of the first browser war between netscape and microsoft.
At one time there were dozens of ASCII variants out there as well, hard-coded into machines. Eventually this stuff will all be replaced by unicode. It has the momentum. The current problem is still partial implementations and different implementations. Even at the OS level. I don't know on windows, but much of this functionality is a service in OS X, it is provided freely to the programmer by the system. Still, there are programmers in OS X who just avoid the issue. Cocoa makes it easy for those developers to support all the languages it supports in unicode. TextMate, the premier Ruby/Rails editor on OS X, is a great example of this. It doesn't support CJK properly. Odd, because TextEdit does, because it recieves all of that support free from the system.

Windows must surely have a similar offering to its developers.

Still there are lots of crusty developers out there who don't see an issue with this.

So we continue to get a lot of *accepted* problems with language support.

The first scripting language to really start to push unicode will be the winner in the future.

···

On May 10, 2007, at 7:13 PM, Michal Suchanek wrote:

It may be viewed as refusal to follow standards and encouraging bad
webmaster practices (using some proprietary Windows encoding and
relying on Explorer to guess right). On the other hand, it could be
seen as an attempt to remove some burden from the users. A web browser
developer may implement scheme for guessing the encoding on sites that
do not specify it but cannot fix the sites.

However, the right implementation would also include a big fat warning
about the encoding being guessed. This serves both to let the user
know that the site is deficient and may be displayed incorrectly and
to remind the web developer that it should be fixed.

Thanks

Michal

True that it would be nice if the user-agent could alert that the
encoding is being guessed at and is actually unknown, but...
how many average users even know what an encoding is? Few. Many

That's not a problem. The message that the page is broken in some way
is communicated.

programmers don't even know about it. It's a situation caused by

That is the primary goal. The developer should see his page is broken.
If the developer does not know about encoding he is poorly prepared
for development in a world where various encodings are used. Any
decent web tutorial must at least mention encoding because it is
required by the standard. And we are talking about the web here.

history. In the past, systems were more expensive and less powerful.
Compromises were frequently made. (Y2K bug was a result of the same
problem.) Now there is a sense of unlimited potential for storage and
processing power, but there is still the legacy content and still
content being produced following many different standards.
Some of the problems with web development are also a result of the
first browser war between netscape and microsoft.
At one time there were dozens of ASCII variants out there as well,
hard-coded into machines. Eventually this stuff will all be replaced

I do not see how the browser war caused existence of the various
encodings that existed even before the web. Yes, they were mainly
caused by the decision to put the character into a single byte
initially which could not accommodate all languages at once.

by unicode. It has the momentum. The current problem is still partial

But it is not thing of the past. Documents in various encodings exist.
Unicode is good enough for most uses but new non-unicode encodings
exist for special purposes not served by unicode.

implementations and different implementations. Even at the OS level.
I don't know on windows, but much of this functionality is a service
in OS X, it is provided freely to the programmer by the system.
Still, there are programmers in OS X who just avoid the issue. Cocoa
makes it easy for those developers to support all the languages it
supports in unicode. TextMate, the premier Ruby/Rails editor on OS X,
is a great example of this. It doesn't support CJK properly. Odd,
because TextEdit does, because it recieves all of that support free
from the system.

New software should support unicode in the first place, and conversion
if need for different encodings is anticipated. CJK input is a
different issue, though. Sure most systems (X11, Windows, OS X)
support CJK and other complex input methods. But it only works if you
do not touch certain input paths. If TextMate implements something
like identifier completion it may well clash with CJK input which is
sort of completion as well. If the author used the input method
himself he would certainly notice. But many people do not.

Windows must surely have a similar offering to its developers.

Still there are lots of crusty developers out there who don't see an
issue with this.

Yes, many projects developed exclusively be English developers (or
people who only use languages that fit in the Latin1 range) completely
ignore internationalization issues. Or try to bolt them on once the
project becomes popular enough that the issue becomes apparent.

So we continue to get a lot of *accepted* problems with language
support.

Not necessarily accepted. But sure they exist and are slow to weed out.

The first scripting language to really start to push unicode will be
the winner in the future.

Not only unicode. You can get that with Python, and people still
complain. The ability to work in different encodings seamlessly is
needed.

Thanks

Michal

···

On 10/05/07, John Joyce <dangerwillrobinsondanger@gmail.com> wrote: