I18n (was Re: Andy Roonie)

Benjamin_Peterson1 · 28 June 2002 16:31

Like many others, I would be happy to devote a large
amount of time to Ruby. In my particular case it
would be to i18n, since I can’t use Ruby without it.

But in practice, I have no way to find out whether
someone in Japan is already making an i18n effort,
or
whether any changes I made would be accepted, or
whether matz has decided what i18n should consist
of,
so it doesn’t really make sense for me to do
anything
at all.

You can tell me what you like to see in the future,
although I cannot
promise you anything (yet). I mean I’d like to hear
about the spec,
not about the implementation.

Well, since you asked for my Christmas list, here it
is! My wishes for the spec are very similar to those
you stated years ago:

Their’s only one I18N policy for Ruby.
It should not cause me trouble handling Japanese.
([ruby-talk:02587])

I would simply like to amend it slightly, thus:

It should not cause me trouble handling text.

To meet this spec, I think the following features
would be needed.

Files in text mode should be read in to provide a
stream of characters, not bytes. It will sometimes be
necessary to specify the encoding explicitly, but most
common ones can be guessed. Ruby should NOT stop
reading a file when it comes to a 0x1a character!
splutter
Files in text mode should appear in Ruby as a stream
of characters, and be written out to disk as bytes in
the specified format.

*Consoles and other IO devices are like files in this
respect. To my Ruby program, it looks like I am just
sending and getting ‘characters’. In the Ruby engine
code, it will be necessary to translate them to
whatever encoding is specified for the
console/port/whatever.

*Strings should be of characters. length() should
return character length, each() should iterate by
characters, [4] should get me the 4th character in the
string. Bytes and encodings are an implementation
detail and I do not want to have to think about them
when I think of a ‘string’.

*Regular expressions should work, even if I am
searching for a hangul followed by an
accent-independant ‘e’ in a chinese document. They
should operate on characters, not bytes.

*All characters that exist in Unicode plane 0 should
be specifiable, handled identically, handled fast, and
handled in constant time in Ruby. Other characters
like unicode surrogates and TRON characters are not
essential; they may require special syntax and slower
processing or may be unsupported totally.

*Source string literals should be able to contain any
Unicode character. There is no need for source to be
able to be in any arbitrary encoding, though. UTF8
would probably be good.

*Finally, although generally I want to think of a
string as just characters, sometimes I need to deal
with software that thinks in terms of bytes and
INSISTS on EUC-KR or ASMO-708 or some other strange
encoding. For these cases, it would be necessary to
translate a string into a particular encoding like so:

a = "my string".get_encoded_bytes("EUCKR")
#  a is now an array of bytes...

pauses for breath

I would of course be willing to work on any of these
things if there were a plan.

For your information, >you can get and
see my experimental M17N implementation from the CVS
ruby_m17n branch.

I know, but I figured something must have changed
since then, even if there is no physical expression of
it in cvs.

Speaking of things in cvs, though, I should
congratulate Kosako-san on providing a non-gnu regular
expression library and thus removing a painful
licensing issue. Ah, how wonderful oniguruma is! How
yet more wonderful it could be if it worked on wide
chars!

Benjamin

x

Yukihiro_Matsumoto2 · 1 July 2002 05:50

Hi,

Thank you for the ideas.

Speaking of things in cvs, though, I should
congratulate Kosako-san on providing a non-gnu regular
expression library and thus removing a painful
licensing issue. Ah, how wonderful oniguruma is! How
yet more wonderful it could be if it worked on wide
chars!

I think it’s not too hard for oniguruma to support wide characters,
except for wide variable length characters like UTF-16.

						matz.

···

In message “i18n (was Re: Andy Roonie)” on 02/06/29, Benjamin Peterson bjsp123@yahoo.com writes:

Topic		Replies	Views
Andy Roonie is perhaps excessively optimistic ruby-talk	3	176	1 July 2002
Unicode in Ruby now? ruby-talk	0	104	6 August 2002
Andy Roonie is perhaps excessively optimistic ruby-talk	2	204	27 June 2002
[OT] Internationalization, Localization and Encodings ruby-talk	4	104	28 September 2009
Andy Roonie is perhaps excessively optimistic ruby-talk	0	138	29 June 2002

I18n (was Re: Andy Roonie)

Related topics