Except that @top is guaranteed to not have an encoding -- at least it
damned well better not -- and @top.bytes is redundant in this case. I
see no reason to access #bytes unless I know I'm dealing with a
multibyte String.
You never know if you are, that's the problem. And no, it's NOT
redundant. You should just get used to the fact that _all_ strings
might become multibyte.
How can you continue to be so wrong? All strings will *not* become
multibyte. Matz seems pretty committed to the m17n String, which means
that you're not going to get a Unicode String. This is *good*.
When you're not getting a String that is limited to Unicode, you don't
need a separate ByteArray. This is also good.
Worse, why would "Not PNG." be treated as Unicode under your scheme
but "\x89PNG\x0d\x0a\x1a\x0a" not be? I don't think you're thinking
this through.
@top[0, 8] is sufficient when you can guarantee that sizeof(char) ==
sizeof(byte).
You can NEVER guarantee that. N e v e r. More languages and more
people use multibyte characters by default than all ASCII users
combined.
Again, you are wrong. Horribly so. I *can* guarantee that sizeof(char)
== sizeof(byte) if String#encoding is a single-byte encoding or is "raw"
(or "binary", whichever Matz uses).
It seems very pity but you still approcah multibyte strings as
something "special".
It seems very sad, but you still aren't willing to comprehend what I'm
saying.
On "raw" strings, this is always the case.
The only way to distinguish "raw" strings from multibyte strings is to
subclass (which sucks for you as a byte user and for me as strings
user).
Incorrect. I do not need to have:
UnicodeString
BinaryString
USASCIIString
ISO88591String
Never have. Never will.
What you're not understanding -- and at this point, I am *really*
thinking that it's willful -- is that I don't consider multibyte strings
"special." I consider *all encodings* special. But I also don't think I
need full *classes* to support them. (I know for a fact that I don't.)
What's special is the encoding, not the string. Any string -- including
a UTF-32 string -- is *merely* a sequence of bytes. The encoding tells
me how large my "characters" are in terms of bytes. The encoding can
tell me more than that, too. This means that an encoding is simply a
*lens* through which that sequence of bytes gains meaning.
Therefore, I can do:
s = b"Wh\xc3\xa4t f\xc3\xb6\xc3\xb6l\xc3\xafshn\xc3\xabss."
s.encoding = :utf8
s # "Whät föölïshnëss."
Gee. No subclass involved.
A substring of a "binary" (unencoded) string is simply the bytes
involved.
We're not talking rocket science here. We're talking being smart,
instead of being lemmings who apparently want Ruby to be more like Java.
On all strings, @top[0, 8] would return the appropriate number of
characters -- not the number of bytes. It just so happens on binary
strings that the number of characters and bytes is exactly the same.
This is a very leaky abstraction - you can never expect what you will
get. What's the problem with having bytes as an accessor?
What's the need, if I *know* that what I'm testing against is going to
be dealt with bytewise? You're expecting programmers to be stupid. I'm
expecting them to be smarter than that. Uninformed, perhaps, but not
stupid.
(And I would know in this case because the ultimate API that calls this
will have been given image data.)
What I'm arguing is that while the pragma may work for the
less-common encodings, both binary (non-)encoding and Unicode
(probably UTF-8) are going to be common enough that specific literal
constructors are probably a very good idea.
Python proved that to be wrong - both the subclassing part and the
literals part.
Python proved squat. Especially since you continue to think that I'm
talking about subclassing. Which I'm not and never have been.
The fact that you have to designate Unicode strings with literals is a
bad decision and I can only suspect that it has to do with compiler
intolerance, and the need to do preprocessing.
Have to nothing. You're simply not willing to understand anything that
doesn't bow to the god of Unicode. This has nothing to do with your
stupid assumptions, here. This has everything to do with being smarter
than you're apparently wanting Ruby to be.
The special literals are convenience items only. Syntax sugar. The real
magic is in the assignment of encodings. And those are *always* special,
whether you want to pretend such or not.
I'm through with trying to argue with you and a few others who aren't
listening and suggesting the same backwards stuff over and over again
without considering that you might be wrong. Contrary to what you might
believe, I *have* looked at a lot of this stuff and have really reached
the point where Unicode-only and separate class hierarchies is a waste
of everyone's time and energy.
Argue for first-class Unicode support. But you should do so within the
framework which Matz has said he prefers (m17n String and no separate
byte array). Think about API changes that can make this valuable. I
think that Matz *has* settled on the basic data structure, though, and
it's a fight you probably won't win with him. Since, as he pointed out
to Charles Nutter, he's in the percentage of humanity which needs to
deal with non-Unicode more than it needs to deal with Unicode.
-austin
···
On 6/28/06, Julian 'Julik' Tarkhanov <listbox@julik.nl> wrote:
On 28-jun-2006, at 20:36, Austin Ziegler wrote:
--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
* austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
* austin@zieglers.ca