Like many others, I would be happy to devote a large
amount of time to Ruby. In my particular case it
would be to i18n, since I can’t use Ruby without it.
But in practice, I have no way to find out whether
someone in Japan is already making an i18n effort,
whether any changes I made would be accepted, or
whether matz has decided what i18n should consist
so it doesn’t really make sense for me to do
You can tell me what you like to see in the future,
although I cannot
promise you anything (yet). I mean I’d like to hear
about the spec,
not about the implementation.
Well, since you asked for my Christmas list, here it
is! My wishes for the spec are very similar to those
you stated years ago:
Their’s only one I18N policy for Ruby.
It should not cause me trouble handling Japanese.
I would simply like to amend it slightly, thus:
It should not cause me trouble handling text.
To meet this spec, I think the following features
would be needed.
Files in text mode should be read in to provide a
stream of characters, not bytes. It will sometimes be
necessary to specify the encoding explicitly, but most
common ones can be guessed. Ruby should NOT stop
reading a file when it comes to a 0x1a character!
Files in text mode should appear in Ruby as a stream
of characters, and be written out to disk as bytes in
the specified format.
*Consoles and other IO devices are like files in this
respect. To my Ruby program, it looks like I am just
sending and getting ‘characters’. In the Ruby engine
code, it will be necessary to translate them to
whatever encoding is specified for the
*Strings should be of characters. length() should
return character length, each() should iterate by
characters,  should get me the 4th character in the
string. Bytes and encodings are an implementation
detail and I do not want to have to think about them
when I think of a ‘string’.
*Regular expressions should work, even if I am
searching for a hangul followed by an
accent-independant ‘e’ in a chinese document. They
should operate on characters, not bytes.
*All characters that exist in Unicode plane 0 should
be specifiable, handled identically, handled fast, and
handled in constant time in Ruby. Other characters
like unicode surrogates and TRON characters are not
essential; they may require special syntax and slower
processing or may be unsupported totally.
*Source string literals should be able to contain any
Unicode character. There is no need for source to be
able to be in any arbitrary encoding, though. UTF8
would probably be good.
*Finally, although generally I want to think of a
string as just characters, sometimes I need to deal
with software that thinks in terms of bytes and
INSISTS on EUC-KR or ASMO-708 or some other strange
encoding. For these cases, it would be necessary to
translate a string into a particular encoding like so:
a = "my string".get_encoded_bytes("EUCKR") # a is now an array of bytes...
pauses for breath
I would of course be willing to work on any of these
things if there were a plan.
For your information, >you can get and
see my experimental M17N implementation from the CVS
I know, but I figured something must have changed
since then, even if there is no physical expression of
it in cvs.
Speaking of things in cvs, though, I should
congratulate Kosako-san on providing a non-gnu regular
expression library and thus removing a painful
licensing issue. Ah, how wonderful oniguruma is! How
yet more wonderful it could be if it worked on wide