David Masover wrote:
> Java at least did this sanely -- UTF16 is at least a fixed width. If
> you're going to force a single encoding, why wouldn't you use
> fixed-width strings?Actually, it's not.
Whoops, my mistake. I guess now I'm confused as to why they went with UTF-16
-- I always assumed it simply truncated things which can't be represented in
16 bits.
You can produce corrupt strings and slice into a half-character in
Java just as you can in Ruby 1.8.
Wait, how?
I mean, yes, you can deliberately build strings out of corrupt data, but if
you actually work with complete strings and string concatenation, and you
aren't doing crazy JNI stuff, and you aren't digging into the actual bits of
the string, I don't see how you can create a truncated string.
> The whole point of having multiple encodings in the first place is that
> other encodings make much more sense when you're not in the US.There's also a lot of legacy data, even within the US. On IBM systems,
the standard encoding, even for greenfield systems that are being
written right now, is still pretty much EBCDIC all the way.
I'm really curious why anyone would go with an IBM mainframe for a greenfield
system, let alone pick EBCDIC when ASCII is fully supported.
And now there's a push for a One Encoding To Rule Them All in Ruby 2.
That's *literally* insane! (One definition of insanity is repeating
behavior and expecting a different outcome.)
Wait, what?
I've been out of the loop for awhile, so it's likely that I missed this, but
where are these plans?
···
On Wednesday, November 24, 2010 08:40:22 pm Jörg W Mittag wrote: