Strange behaviour of Strings in Range

As an outsider, and as an application programmer, if the language has to=20
resort to special add-on libraries or different method names to handle=20
different character data types, forget it.

Actually, it’s not really hard to hide almost all of that from the
application programmer. Windows does it very nicely with it’s TCHAR macro,
_T(…) preprocessing magic, and it’s renaming of API calls depending on
whether or not the target platform supports 8 or 16 bits.

A string is a string is a string. As an application programmer I want to
b=
e=20
able to use substring, split, character by positional index, and all the=20
other standard string methods and not have to worry about what kind of
stri=
ng=20
data I am handling. I’m prepared to pass a parameter to stream handlers=20
telling them what type of encoding to use between external streams and=20
internal representations if I have to - or even between different
internal=
=20
representations if that’s the sort of thing that is a particular
programmer=
's=20
bag. I could even live with parameters like that on the standard string=20
methods if I had to.

But the original code that I code and test to work with
&mylanguageofchoice=
;=20
MUST work with ANY other language, without change. All other language
user=
s=20
must have to do to get my program to work in their language is to
translate=
=20
fixed strings (and perhaps change GUI layouts because of sizings). This
is=
=20
absolutely basic and should be in the Ruby core. I thought this kind of=20
thing was why we have OO.

How you do it, I don’t care. I don’t care what the internal representation

is;=20
that’s an implementation detail, the sort of thing that OO should be
hiding=
=20
from me anyway. All I know is I’m handling a string, and I want to do
stri=
ng=20
manipulations. I don’t care is it’s Unicode or ASCII, Kanji or Kanuck.

Sorry, if I can’t do that in Ruby, it’s broken and, I’m afraid, unusable.

It’s amazing (to me, anyway) that you wrote that, as that sentiment so
closely mirrors my own initial attitude towards Ruby. The first time I was
looking at Ruby and trying to decide whether or not I should invest time in
learning it (which was months ago, at the very least), as soon as I found
out that it didn’t handle strings as 16 bits internally, I was completely
put off. My opinion at the time was that any computer language that didn’t
support strings with 16 bit wide characters could not be taken seriously.
(Yeah, a very snobby attitude, but people are like that sometimes.)

But, you know what? I was totally wrong. Since Ruby strings do permit
embedded ‘\0’ characters, they give me all the flexibility I need. A string
in Ruby is simply an sequence of bytes. If I want to create a ‘Unicode’
like sequence of 16 bit characters I can - all I need to do is initialise an
array with the proper sequence of numbers and unpack that to a string.
(And, of course, remember that I need to implement all operations on such a
sequence based on accessing 2 byte sequences at a time.) If I want to do
anything special with strings, there is nothing stopping me from writing my
own string class… The beauty of the Ruby way is that it can deal with
strings that are internally 16 bits wide or whatever, but still also
implement that logic with what it’s compiler sees as 8 bit wide character
sequences (which means even IMO ‘broken’ C/C++ compilers can build a working
Ruby system).

Vive la OOP difference! :wink:

Martin

···

Mike Calder ceo@phosco.com wrote: