Message catalogs (I18N) overnight hack

HAL_9000 · 29 June 2003 22:46

[snip]

Thanks for these notes. But my feeble knowledge of
German is not the point here.

Your mention of the decimal point and similar issues
are things I had not thought of, however.

Technical side: According to me redefining ‘print’ is an unacceptable
violation of the principle of least surprise. The command reads
‘print’ as in ‘print that book’ not as in ‘publish a translation of
that book’.

I see what you mean. But if one can turn this on/off dynamically,
it may not be so bad.

What about this?

String.locale(‘C’, ‘de_DE’)
puts ‘Alternatives exist’.l10n

First argument of String.locale is language strings are expected to
be in (‘C’ is default), second is target language).

I see the advantages of this. But one of my goals
was to be able to internationalize without too many
changes to the original program. And another is to
preserve readability and simplicity – I dislike
having to call a method every time I use a string.

It remains to be seen whether this will be released,
anyway. There are issues to resolve.

Thanks,
Hal

···

----- Original Message -----
From: “Josef ‘Jupp’ Schugt” jupp@gmx.de
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Sunday, June 29, 2003 2:30 PM
Subject: Re: Message catalogs (I18N) overnight hack

–
Hal Fulton
hal9000@hypermetrics.com

Guillaume_Marcais1 · 30 June 2003 14:13

I would side Josef. Do not change print, puts, etc. It may have some
weird side effects.

For “literal” (untranslated) output, I’m exposing the aliases
lputs, lprint, and lprintf

I would do it the other way around. Keep print, etc. unchanged and add
tprint, etc. for translated printing.

Guillaume.

···

On Sun, 2003-06-29 at 18:46, Hal E. Fulton wrote:

----- Original Message -----
From: “Josef ‘Jupp’ Schugt” jupp@gmx.de
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Sunday, June 29, 2003 2:30 PM
Subject: Re: Message catalogs (I18N) overnight hack

[snip]

Thanks for these notes. But my feeble knowledge of
German is not the point here.

Your mention of the decimal point and similar issues
are things I had not thought of, however.

Technical side: According to me redefining ‘print’ is an unacceptable
violation of the principle of least surprise. The command reads
‘print’ as in ‘print that book’ not as in ‘publish a translation of
that book’.

I see what you mean. But if one can turn this on/off dynamically,
it may not be so bad.

What about this?

String.locale(‘C’, ‘de_DE’)
puts ‘Alternatives exist’.l10n

First argument of String.locale is language strings are expected to
be in (‘C’ is default), second is target language).

I see the advantages of this. But one of my goals
was to be able to internationalize without too many
changes to the original program. And another is to
preserve readability and simplicity – I dislike
having to call a method every time I use a string.

It remains to be seen whether this will be released,
anyway. There are issues to resolve.

Thanks,
Hal

–
Hal Fulton
hal9000@hypermetrics.com

Josef_Jupp_SCHUGT · 30 June 2003 20:40

Saluton!

Technical side: According to me redefining ‘print’ is an unacceptable
violation of the principle of least surprise. The command reads
‘print’ as in ‘print that book’ not as in ‘publish a translation of
that book’.

I see what you mean. But if one can turn this on/off dynamically,
it may not be so bad.

Suggestion:
Do not have the require statement redefine ‘puts’ but use a ‘locale’
statement that has this calling convention:

locale(‘de_AT’, ‘fr_FR’)
locale(‘pt_BR’)
locale

Version 1: Source is German as used in Austria (Austrians do not use
the same vocabulary as Germans), target is French as used
in France (differs from French as used in Canada)

Version 2: Source is Portugese as used in Brazil and so is target.
This is to deal with l10n issues like numbers and the
like.

Version 3: Source is in ‘C’ locale and so is target.

And another is to preserve readability and simplicity – I dislike
having to call a method every time I use a string.

IMHO redefining a built-in command’s effect on a built-in data type
makes code unreadable.

Once a linear algebra teacher did proof that humans have severe
problems if their expectations are not met.

Usually problems did read ‘Sei K ein Koerper, sei V ein
Vektorraum…’ which means ‘K be a commutative field, V be a vector
space’.The mathematical details only play a role as far as a
field and a vector space are completely different objects and the
use of the letters was mnemonic.

One day he did write ‘Sei K ein Vektorraum, sei V ein Koerper’. You
thing only a handful of guys did mange to solve the problems? You are
right.

Gis,

Josef ‘Jupp’ Schugt

···

–
Someone even submitted a fingerprint for Debian Linux running on the
Microsoft Xbox. You have to love that irony :).
– Fyodor on nmap-hackers@insecure.org

HAL_9000 · 30 June 2003 21:10

I see your point, and someone else expressed the same opinion.

But I stand by this principle. I like it personally, whether
others do or not.

Personally I don’t want to sprinkle ‘tputs’ and such throughout
my code. I want to modify my code as little as possible.

Besides, a name like tputs will lead the reader subconsciously to
think that something unusual is happening, when it is really (in
effect) just a puts. Sure, it’s doing a translation, but I want
the programmer NOT to think about translation as he looks at
the code.

In other words, I want translation to be as transparent and
behind-the-scenes as possible.

It’s not true AOP, but it has an AOP-like flavor to me.

At any rate, the library may not ever be finished, as no one
has expressed real interest in it (other than in changing the
design). I don’t want to make fundamental changes in my
design without a reason that I agree with; if I did, it wouldn’t
be my project anymore. If someone likes the overall idea, but
dislikes my concept, then they can write their own code.

In fact, I have discovered there are similar libraries already
in the RAA – they differ from mine only in having a larger,
more intrusive footprint in the code (and obviously they have
more features).

Cheers,
Hal

···

----- Original Message -----
From: “Josef ‘Jupp’ Schugt” jupp@gmx.de
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Monday, June 30, 2003 3:40 PM
Subject: Re: Message catalogs (I18N) overnight hack…

And another is to preserve readability and simplicity – I dislike
having to call a method every time I use a string.

IMHO redefining a built-in command’s effect on a built-in data type
makes code unreadable.

Josef_Jupp_SCHUGT · 1 July 2003 00:01

Saluton!

Besides, a name like tputs will lead the reader subconsciously to
think that something unusual is happening, when it is really (in
effect) just a puts.

I can only speak of me but my concept of localization is ‘translate
the text and then output the result’ where translating is the hard
task that needs extra care and the output simply works. To reflect
this concept there has to be a function that translates the text.

Besides that intellectual problem there is also a practical one: If
the output of localized and unlocalized texts precisely looks the
same this can result in very hard to find errors. If you use a
function call for translation the forgotten ‘require’ automatically
results in an undefined function error.

In other words, I want translation to be as transparent and
behind-the-scenes as possible.

Maybe some day I will understand why ‘opaque’ and ‘transparent’ are
synonyms in the field of programming.

Gis,

Josef ‘Jupp’ Schugt

···

–
Someone even submitted a fingerprint for Debian Linux running on the
Microsoft Xbox. You have to love that irony :).
– Fyodor on nmap-hackers@insecure.org

Brian_Candler · 1 July 2003 10:01

Maybe such a low-level transparent translation belongs either in String, or
in the IO class, rather than overriding all the various Kernel#puts-type
methods.

Then you could make it explicit:

$defout = TranslatingIO.new(“en”,“de”,STDOUT)
puts “Hello world” # >> “Hallo Weldt” or whatever

It might then be more general - for example it could be used on StringIO
objects - but less intrusive.

But I take your point that it’s your project so it’s up to you to design it
how you like

I don’t think I’d use the proposed style of library. Firstly it would be
very difficult to ensure complete coverage of all strings having
translations [unless there was a utility to parse the Ruby source to extract
all strings, and tie them up against all translations, and highlight any
missing ones]

Also I’d be a bit concerned about phrases where the word order might need to
be different in different languages:

printf("I gave the %s to %s", thing, recipient)

It could perhaps use tags which were automatically stripped out in
translation:

printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)

Also, do we worry about languages where word endings change dependent on
function? Languages which require noun.capitalize ?

Regards,

Brian.

···

On Tue, Jul 01, 2003 at 06:10:51AM +0900, Hal E. Fulton wrote:

I see your point, and someone else expressed the same opinion.

But I stand by this principle. I like it personally, whether
others do or not.

Personally I don’t want to sprinkle ‘tputs’ and such throughout
my code. I want to modify my code as little as possible.

Besides, a name like tputs will lead the reader subconsciously to
think that something unusual is happening, when it is really (in
effect) just a puts. Sure, it’s doing a translation, but I want
the programmer NOT to think about translation as he looks at
the code.

In other words, I want translation to be as transparent and
behind-the-scenes as possible.

Nobuyoshi_Nakada · 1 July 2003 12:50

Hi,

···

At Tue, 1 Jul 2003 19:01:06 +0900, Brian Candler wrote:

Also I’d be a bit concerned about phrases where the word order might need to
be different in different languages:
printf("I gave the %s to %s", thing, recipient)
It could perhaps use tags which were automatically stripped out in
translation:
printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)

printf("I gave the %1$s to %2$s", thing, recipient)

–
Nobu Nakada

HAL_9000 · 1 July 2003 13:25

Maybe such a low-level transparent translation belongs either in String,
or
in the IO class, rather than overriding all the various Kernel#puts-type
methods.

Well, that’s a thought. I’m not sure I see all of the implications
at this hour of the morning. Too much blood in my caffeine stream.

But I take your point that it’s your project so it’s up to you to design
it
how you like

Ha… well, not all design changes are created equal.

If I’m designing a horse, and someone says, “If you added a horn, you
could have a unicorn” – well, that is interesting. But if someone
says, “Drop the hooves and hair, skip the mammal bit, change the legs
and add four more, and make it ocean-living – you could have an
octopus!” – well, that is different.

I don’t think I’d use the proposed style of library. Firstly it would be
very difficult to ensure complete coverage of all strings having
translations [unless there was a utility to parse the Ruby source to
extract
all strings, and tie them up against all translations, and highlight any
missing ones]

What would partially address this would be the warning and logging
features I mentioned (not implemented).

Logging would capture all strings as they were output, for later
translation.
Warning would print an explicit warning when an untranslated string was
found. These would of course have to be turned on explicitly. Then you would
just need good code coverage, as from a set of testcases.

Also I’d be a bit concerned about phrases where the word order might need
to
be different in different languages:
printf("I gave the %s to %s", thing, recipient)

I address this issue. The prepared message can contain %n markers like
%1, %2, %3… in matching, these basically become (.*?) patterns.

Some of my contrived examples dealt with this issue, like the “User foo…”
example.

Also, do we worry about languages where word endings change dependent on
function? Languages which require noun.capitalize ?

Word endings are an issue. Sometimes people store plurals separately, e.g.,
file=>Datei, files=>Dateien and so on. This is an area which pushes the
limits of my knowledge both of I18N programming and languages in general.

As for capitalizing… hmm. I don’t offhand see where anything would ever
have to be capitalized that was not hardcoded in the translated message.

But there are several little issues like that, that I’m not addressing yet.
Someone mentioned the decimal point issue. I hate to think about that.

Hal

···

----- Original Message -----
From: “Brian Candler” B.Candler@pobox.com
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, July 01, 2003 5:01 AM
Subject: Re: Message catalogs (I18N) overnight hack…

Josef_Jupp_SCHUGT · 2 July 2003 11:03

Saluton!

Brian Candler; 2003-07-01, 12:05 UTC:

Also I’d be a bit concerned about phrases where the word order
might need to be different in different languages:
printf("I gave the %s to %s", thing, recipient)
It could perhaps use tags which were automatically stripped out in
translation:
printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
Also, do we worry about languages where word endings change
dependent on function? Languages which require noun.capitalize ?

You forgot something: Word order

printf(“%s of %s”, ‘the house’, ‘a friend’)
tomodachi no ie

I did use latin transcription of Japanese in order to make it
readable. This involves two changes:

a) dropping articles ‘the’ and ‘a’
b) reversing word order

‘tomodachi’ means ‘friend’ while ‘ie’ means ‘house’.

This problem is the reason why Microsoft did introduce %1, %2, … in
C#. Ruby’s “#{}” is even better.

German does use capitalization of nouns and there are quite a number
of native speakers of that language.

Gis,

Josef ‘Jupp’ Schugt

···

–
Someone even submitted a fingerprint for Debian Linux running on the
Microsoft Xbox. You have to love that irony :).
– Fyodor on nmap-hackers@insecure.org

HAL_9000 · 1 July 2003 13:42

Also I’d be a bit concerned about phrases where the word order might
need to
be different in different languages:
printf("I gave the %s to %s", thing, recipient)
It could perhaps use tags which were automatically stripped out in
translation:
printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
printf("I gave the %1$s to %2$s", thing, recipient)
–
Nobu Nakada

Yes, that is how I have done it in C on AIX.

But I simplified it in my code – it does not require
any change to printf or to its format string. I use
markers in the translated strings themselves:

“I gave the %1 to %2.” →
“Ich habe den %2 zu %1 gegeben.” # Word ending issues!

Apparently you know something about I18N. How are issues
like word endings usually handled? With different messages??

For example, “I saw the %1” in English – in German,
“Ich sah den Wagen” (I saw the car) but “Ich sah die Bruecke”
(I saw the bridge).

My German is flawed, but you see what I am asking about
den/die I think.

Hal

···

----- Original Message -----
From: nobu.nokada@softhome.net
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, July 01, 2003 7:50 AM
Subject: Re: Message catalogs (I18N) overnight hack…

At Tue, 1 Jul 2003 19:01:06 +0900, > Brian Candler wrote:

–
Hal Fulton
hal9000@hypermetrics.com

Brian_Candler · 1 July 2003 13:48

And then you get into date formats (US middle-endian), three-letter
abbreviations for month names, … ugh.

I’m sure these problems must have been gone through before. I noted a while
ago under FreeBSD that gmake had some strange dependencies:

pkg_add gmake-3.79.1_1.tgz

pkg_add: could not find package libiconv-1.7_5 !
pkg_add: could not find package expat-1.95.2 !
pkg_add: could not find package gettext-0.11.1_3 !

It seemed strange to me that a ‘make’ utility would have a dependency on an
XML parser. It turns out that gettext is the GNU way to deal with this:

[man gettext]

DESCRIPTION
The gettext program translates a natural language message
into the user’s language, by looking up the translation in
a message catalog.

I imagine that the source format for these message catalogues is XML, and
hence the requirement on an XML parser (although IMO gettext should be split
into two: a client side which just reads the message catalogues, which
appear to be in a binary format, and a -devel package which includes the
XML-to-message-catalogue tools. But I digress).

I note there’s a ruby-gettext library already. A quick browse and it seems
to require indexing by message-ID rather than the original “untranslated”
text.

That approach makes logical sense to me - decouple all language text from
the source, rather than have language A in the source and languages A,B,C,D
in the translation database. (Otherwise, whenever you change a message in
the source you’d have to update the corresponding language A entry - a
violation of the DRY principle)

Cheers,

Brian.

···

On Tue, Jul 01, 2003 at 10:25:51PM +0900, Hal E. Fulton wrote:

But there are several little issues like that, that I’m not addressing yet.
Someone mentioned the decimal point issue. I hate to think about that.

HAL_9000 · 2 July 2003 18:20

No, I think you misunderstood. Word order was
the reason he suggested the labels in the first
place.

Hal

···

----- Original Message -----
From: “Josef ‘Jupp’ Schugt” jupp@gmx.de
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Wednesday, July 02, 2003 6:03 AM
Subject: Re: Message catalogs (I18N) overnight hack…

It could perhaps use tags which were automatically stripped out in
translation:
printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
Also, do we worry about languages where word endings change
dependent on function? Languages which require noun.capitalize ?
You forgot something: Word order

Nobuyoshi_Nakada · 1 July 2003 15:35

Hi,

printf("I gave the %1$s to %2$s", thing, recipient)
Yes, that is how I have done it in C on AIX.

But I simplified it in my code – it does not require
any change to printf or to its format string. I use
markers in the translated strings themselves:

Change? Although I may not correctly understand what you mean,
it’s already even in 1.6.

“I gave the %1 to %2.” →
“Ich habe den %2 zu %1 gegeben.” # Word ending issues!

$ ruby-1.6 -v -e ‘thing=“thing”;recipient=“recipient”;
printf(“I gave the %1$s to %2$s\n”, thing, recipient);
printf(“Ich habe den %2$s zu %1$s gegeben.\n”, thing, recipient)’
ruby 1.6.8 (2003-06-28) [i686-linux]
I gave the thing to recipient
Ich habe den recipient zu thing gegeben.

Apparently you know something about I18N. How are issues
like word endings usually handled? With different messages??

For example, “I saw the %1” in English – in German,
“Ich sah den Wagen” (I saw the car) but “Ich sah die Bruecke”
(I saw the bridge).

I know just gettext can handle plural forms, but nothing about
this case. How do German solve this issue in gettext?

···

At Tue, 1 Jul 2003 22:42:13 +0900, Hal E. Fulton wrote:

–
Nobu Nakada

HAL_9000 · 1 July 2003 17:39

It seemed strange to me that a ‘make’ utility would have a dependency on
an
XML parser. It turns out that gettext is the GNU way to deal with this:

[man gettext]

I know about this, but I still like the low-impact
approach. Obviously the message catalog prep takes
some time/effort; but I like the fact that most/many
apps will only require two extra lines of code,
and no other changes in the code itself.

That approach makes logical sense to me - decouple all language text
from
the source, rather than have language A in the source and languages
A,B,C,D
in the translation database. (Otherwise, whenever you change a message in
the source you’d have to update the corresponding language A entry - a
violation of the DRY principle).

Now THAT is a good point.

Ooh, a violation of DRY. Don’t tell Dave!!

On the other hand, does gettext have the notion of “default”
text? I think it must. (What is used if no message catalog
can be found?) That would still have to be kept synchronized
between the source and the catalogs.

Hal

···

----- Original Message -----
From: “Brian Candler” B.Candler@pobox.com
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, July 01, 2003 8:48 AM
Subject: Re: Message catalogs (I18N) overnight hack…

Masao_Mutoh · 1 July 2003 18:08

Hi,

···

On Tue, 1 Jul 2003 22:48:21 +0900 Brian Candler B.Candler@pobox.com wrote:

pkg_add gmake-3.79.1_1.tgz

pkg_add: could not find package libiconv-1.7_5 !
pkg_add: could not find package expat-1.95.2 !
pkg_add: could not find package gettext-0.11.1_3 !

It seemed strange to me that a ‘make’ utility would have a dependency on an
XML parser. It turns out that gettext is the GNU way to deal with this:

[man gettext]

DESCRIPTION
The gettext program translates a natural language message
into the user’s language, by looking up the translation in
a message catalog.

I imagine that the source format for these message catalogues is XML, and
hence the requirement on an XML parser (although IMO gettext should be split
into two: a client side which just reads the message catalogues, which
appear to be in a binary format, and a -devel package which includes the
XML-to-message-catalogue tools. But I digress).

expat is used in xgettext for glade only.
xgettext is the tool which extract translatable strings from given source codes.

–
.:% Masao Mutohmutoh@highway.ne.jp

HAL_9000 · 1 July 2003 17:42

Hi,
printf("I gave the %1$s to %2$s", thing, recipient)
Yes, that is how I have done it in C on AIX.

But I simplified it in my code – it does not require
any change to printf or to its format string. I use
markers in the translated strings themselves:
Change? Although I may not correctly understand what you mean,
it’s already even in 1.6.

Forgive my ignorance. I didn’t know that printf
supported numbered parameters.

I know just gettext can handle plural forms, but nothing about
this case. How do German solve this issue in gettext?

I have no idea. Perhaps just by careful wording.

Hal

···

----- Original Message -----
From: nobu.nokada@softhome.net
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, July 01, 2003 10:35 AM
Subject: Re: Message catalogs (I18N) overnight hack…

At Tue, 1 Jul 2003 22:42:13 +0900, > Hal E. Fulton wrote:

Masao_Mutoh · 1 July 2003 18:36

Hi,

From: “Brian Candler” B.Candler@pobox.com
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, July 01, 2003 8:48 AM
Subject: Re: Message catalogs (I18N) overnight hack…

On the other hand, does gettext have the notion of “default”
text? I think it must. (What is used if no message catalog
can be found?) That would still have to be kept synchronized
between the source and the catalogs.

Hal

Usually, gettext uses a English message as a msgid.
And msgid is written in the source code.

the example of Ruby-GetText-Package below:

puts _(“Hello World”) # “Hello World” is a msgid

If the localized-message can’t be find,
msgid is used as the message.

puts _(“Hello Wold”) #=> “KONNICHIWA SEKAI” #Found(Japanese)
puts _(“Hello Wold”) #=> “Hello World” #Not Found

For synchronizing the sources and catalogs,
GNU GetText provides some tools like as msgmerge.

BTW,
The manual of GNU GetText may help you.
http://www.gnu.org/manual/gettext/html_chapter/gettext_toc.html

···

On Wed, 2 Jul 2003 02:39:59 +0900 “Hal E. Fulton” hal9000@hypermetrics.com wrote:

----- Original Message -----

–
.:% Masao Mutohmutoh@highway.ne.jp

Topic		Replies	Views
Message catalogs (I18N) overnight hack ruby-talk	1	92	29 June 2003
Approaches to localization? ruby-talk	21	95	27 April 2004
Speaking of I18N ruby-talk	16	70	3 July 2003
Dow ruby's strftime not attempt POSIX-compliance? ruby-talk	20	297	18 January 2008
Suggestion: swap name of "puts" and "print" and rename "puts" to "put_s" ruby-talk	23	285	27 March 2007

Message catalogs (I18N) overnight hack

pkg_add gmake-3.79.1_1.tgz

pkg_add gmake-3.79.1_1.tgz

Related topics