Message catalogs (I18N) overnight hack

Hi, everyone.

I’ve been thinking about message catalogs (as in I18N)
for the last day or so.

I’ve hacked something together that works like this:

require "msgcat"
MsgCat.load("de_DE")

puts "Hello, world!"
puts "Hello, Tom!"
puts "User adam has opened file garden.eden already."
puts "User foo has opened file bar already."
printf "User %s has opened file %s already.\n", "eve", "apple"

puts "Douglas Adams says that 'the meaning of life' is 42."
puts "This %1 symbol will be ignored."
printf "The cost of %s is $%5.2f.\n", "this item", 49.95

x = "Hello, world!"
y = x.xlate
lputs y                 # Hallo, Welt!

And the output is:

Hallo, Welt!
Hallo, Tom!
Datei garden.eden wird schon geoeffnet von adam.
Datei bar wird schon geoeffnet von foo.
Datei apple wird schon geoeffnet von eve.
Douglas Adams hat gesagt, dass 'the meaning of life' '42' ist.
This %1 symbol will be ignored.
$49.95 ist der Preis von 'this item'.
Hallo, Welt!

What do you think?

It’s just an overnight hack. I might clean it up and release it
if there is interest.

Here are some highlights and limitations:

  • Obviously I’m fiddling with Kernel#puts, #print, and #printf
  • I didn’t think #p should be changed
  • For the heck of it, I added a String#xlate
  • Anything that can’t be found in the message list remains
    untranslated
  • For “literal” (untranslated) output, I’m exposing the aliases
    lputs, lprint, and lprintf
  • Parameters may be ordered (since word order differs between
    languages)
  • There are always two catalogs in use – the primary one is
    called “native.cat” and it allows a lookup into the target
    language in the other catalog
  • The native.cat catalog doesn’t have to be in English
  • The message ordering is irrelevant (though they are sorted
    internally by number of parameters to avoid matching problems)
  • Message order does not have to be the same in corresponding
    catalogs (though the ids must correspond)
  • Currently I’m not dealing with character set issues at all
  • There’s no real compliance with anyone else’s way of doing
    things as yet – I’m reinventing the wheel until it seems
    better not to
  • The message catalogs are YAML-based (thanks, _why!)
  • I’m planning a little utility to create catalogs
  • Besides a utility, I think a little API for adding messages
    is appropriate
  • I’m thinking of a “logging mode” to aid in extracting message
    strings for translation
  • I’m also thinking of a “warning mode” when an untranslated
    string is found
  • There’s error checking possible that I’m not doing yet
  • The MsgCat.detect method will read the $LANG environment
    variable (but so far it ignores the .utf8 or whatever)
  • Right now, message catalogs are searched for only in the
    current directory
  • Internally a message is a Symbol
  • I’m thinking of mputs/mprint/mprintf to take a first arg which
    would explicitly identify the message and bypass the lookup
  • Currently there are some ugly hacks in the code, and it’s
    possible to mislead or confuse the matching (e.g., a message
    with parameters with missing numbers, etc.)

Cheers,
Hal

···


Hal Fulton
hal9000@hypermetrics.com

Saluton!

  • Hal E. Fulton; 2003-06-28, 23:03 UTC:
puts "Hello, world!"
Hallo, Welt!

OK

puts "Hello, Tom!"
Hallo, Tom!

OK

puts "User adam has opened file garden.eden already."
Datei garden.eden wird schon geoeffnet von adam.

Datei garden.eden wurde schon von Benutzer adam geoeffnet.

puts "User foo has opened file bar already."
Datei bar wird schon geoeffnet von foo.

Datei bar wurde schon von Benutzer foo geoeffnet.

printf "User %s has opened file %s already.\n", "eve", "apple"
Datei apple wird schon geoeffnet von eve.

Datei apple wurde schon von Benutzer eve geoeffnet.

puts "Douglas Adams says that 'the meaning of life' is 42."
Douglas Adams hat gesagt, dass 'the meaning of life' '42' ist.

Okay - besides that it is the meaning of life, the universe, and all
that >;->

puts "This %1 symbol will be ignored."
This %1 symbol will be ignored.

Dieses %1-Symbol wird ignoriert.

If the above is to be correct German the Bindestrich has to be
present (although that is changing these days).

printf "The cost of %s is $%5.2f.\n", "this item", 49.95
$49.95 ist der Preis von 'this item'.

Where do the apostrophes come from? Cost is Kosten, not Preis (that’s
price). German has a parallel way ofr formulating the above:

Die Kosten von this item betragen $49,95.

FYI: The recently mentioned DECIMAL POINT IS COMMA statement of COBOL
reflects a difference between German and US way of writing numbers

US: 100,000,000.00

Precisely the other way round.

Note that I did not correct ae/oe/ue/ss. In the case of ‘ss’:
Presently both pre or post Rechtschreibreform spelling are correct.
In the other cases: I suppose you know how to correctly spell these
words >;->

Technical side: According to me redefining ‘print’ is an unacceptable
violation of the principle of least surprise. The command reads
‘print’ as in ‘print that book’ not as in ‘publish a translation of
that book’.

What about this?

String.locale(‘C’, ‘de_DE’)
puts ‘Alternatives exist’.l10n

First argument of String.locale is language strings are expected to
be in (‘C’ is default), second is target language).

Gis,

Josef ‘Jupp’ Schugt

···

DE: 100.000.000,00