Strange behaviour of Strings in Range

Hi,

I’ve done some work with unicode/java/ruby and these remarks might
interest you:

  • Java/Unicode is good and nice, but as far as GUIs are concerned, I18N
    only works with European langauges. You simply cannot translate “just
    the strings” to an application where the whole layout becomes
    right-to-left. E.g. a classic OK/CANCEL dialog in Hebrew would be:
    ±-------+ ±------+

CANCEL | | OK | (in the Hewbrew script of course :slight_smile:
±-------+ ±------+
… just because the whole system is geared to work right-to-left. This
is also true of menu directions, where the ‘start’ button on windows is
etc. (This is one reason I personally stick to the english Windows - I
got unnerved by having to switch which button I click). I would imagine
top-to-bottom GUIs/scripts have even worse problems.

  • I have a multilingual application (English, Hebrew and Spanish)
    written in Java. Worked great (well, only half the functionality was
    there :).
    Then I learned Ruby and decided that I’ll do the data analysis in Ruby
    just because it’s faster to program in. After a couple of days of
    getting my head around the Ruby unicode libraries, I ended up using
    UTF-8 as the current regexp engine seems to handle it. I am still not
    sure if this will bite me in the future, but since this is a personal
    app that’s ok, I can always fix it. However, I didn’t find a good
    unicode GUI TK for Ruby.
    So I ended up with a data analysis layer written in Ruby, working on
    UTF-8 with complex regexps (because \w only matches latin letters). The
    GUI layer was Java, as it was easier to keep the work I’ve done.
    Communication between the two was via files (analyse data once as it’s
    not that big; read pre-processed data from files later). I haven’t tried
    jRuby as it’s still 1.6.8, but I probably will at some point.
    See http://recipeindexer.sourceforge.net, although I haven’t uploaded
    the new Ruby stuff yet.

Conclusions?

  • Is ruby handling unicode easily? No.
  • Is ruby handling unicode well-enough? Yes.
  • Is text-data processing with ruby easier than with java? Yes.
  • Can I make it work together? Yes.
  • Is it good enough for production systems? Well, if you have a couple
    of programmers + support staff to handle internationalization ANYWAY
    (and there’s no way around this to get professional level international
    app in my experience), you might as well use Ruby. Application
    development is faster and easier, leaving you more time to get around
    the tough issues.

In the end of the day, there’s no in-built limitation that can’t be
solved. Question is what are the limitations of your chosen system and
how much effort do you need to solve them?
Ruby is great in this respect at least.

HTH,
Assaph

···

-----Original Message-----
From: Mike Calder [mailto:ceo@phosco.com]
Sent: Wednesday, 5 May 2004 3:20 AM
To: ruby-talk ML
Subject: Re: Strange behaviour of Strings in Range

On Tuesday 04 May 2004 17:27, Hal Fulton wrote:

Can you do that with any language whatsoever? Don’t say Java, because
it’s limited to Unicode.

It’s not a flame. It’s an honest question. Is there any programming
language/environment anywhere where I18N issues are completely
abstracted away and everything is done “nicely”?

Hal

No flame understood, and none intended here either, but I didn’t ask for
I18N
issues to be completely abstracted away. I said I wanted to be able to
write
in one language, to not have to consider the string encoding when
writing
code, and for any other language user to be able to use the code just by

translating the strings. I probably misunderstand the situation (I
often do
until people beat the details into my head with hammers) but as far as I
can
see and in my practical experience Unicode (and by transference Java)
gives
me an “abstracted-enough” solution, for my applications with new data
generated by my applications.

If I understood the nature of the problems with Unicode stated earlier
in this
thread (and I must admit I skimmed a lot of it) it is based on there not

being a unique encoding for all possible glyph/character representations
in
all languages (e.g., 1 unicode for a glyph which represents different
characters in different languages), legacy encodings and legacy encoded
data,
and so on. So you tell me Unicode doesn’t hack it and you want to use
something else. I have no problem with that; I’m not emotionally
attached to
Unicode (or Java, come to that - that’s why I’m looking, after all).

These problems may cause operational difficulties, and a totally clean
solution probably has to handle all of them. Meantime, in the real
world
(and on my plate), there is a need for I18N which ASCII just doesn’t
hack.

All I know is that using Java and Unicode I can write code in English,
and
have people translate it to any European language (including Cyrillic
and
Greek), into Hindi and Japanese, use it with those langauges, and it
looks
and behaves right in that target language. I know, because it’s both
required and been done on code I’ve written.

The only points I’m making are that;

  1. When you get round to implementing whatever (let’s call it SuperCode)
    in
    Ruby, it should be part of the core and transparent to the user of
    String
    methods, and

  2. Until Ruby does with SuperCode or whatever what Java does today with
    Unicode, it’s no use to me for production code and remains an (albeit
    pretty)
    toy as far as my practical applications are concerned.


Clear skies!
Mike Calder.