recently the coding of ruby talk web interface has changed to
EUC-JP. My mozilla has only one font (fixed) for this coding. I can
change the coding by hand, but utf-8 etc. would be more convenient.
···
–
Fritz Heinrichmeyer mailto:fritz.heinrichmeyer@fernuni-hagen.de
FernUniversitaet Hagen, LG ES, 58084 Hagen (Germany)
tel:+49 2331/987-1166 fax:987-355 http://www-es.fernuni-hagen.de/~jfh
Scripsit ille Fritz Heinrichmeyer fritz.heinrichmeyer@fernuni-hagen.de:
recently the coding of ruby talk web interface has changed to
EUC-JP. My mozilla has only one font (fixed) for this coding. I can
change the coding by hand, but utf-8 etc. would be more convenient.
Is UTF-8 a superset of EUC-JP, that is, including not only the Japanese
characters but also the mathematical and Greek ones as well as the
double-width ASCII and half-width Katakana characters? At least the ones I
tried are preserved upon converting euc-jp…utf-8 and then utf-8…euc-jp.
Are there any kterm+w3m users here - who cannot read UTF-8?
···
–
Nochn Hinweis: Ein Fragezeichen pro Satz reicht, mehr wirkt leicht
albern. Und davor bitte kein Leerzeichen machen. Hab ich “Warum” gehört
? Hier hast du die Antwort.
[Volker Gringmuth in de.newusers.questions]
Rudolf Polzer AntiATField_adsgohere@durchnull.de writes:
Is UTF-8 a superset of EUC-JP,
UTF-8 is an encoding for the Unicode character set. EUC-JP is an
encoding for at least 4 character sets, of which all are subsets of
the Unicode character set. So, the character set representable by
UTF-8 is a superset of the character sets representable by EUC-JP.
Conversion between a EUC-JP string to an equivalent UTF-8 string is a
beast as one has to use a conversion table. The good thing is, I think
I’ve seen at least one ruby module that does such conversion, iconv,
which interfaces with the iconv*() functions found in modern unix. The
downside is, iconv may be a unix-only thingie.
YS.
Scripsit ille aut illa Yohanes Santoso ysantoso@jenny-gnome.dyndns.org:
Rudolf Polzer AntiATField_adsgohere@durchnull.de writes:
Is UTF-8 a superset of EUC-JP,
UTF-8 is an encoding for the Unicode character set. EUC-JP is an
encoding for at least 4 character sets,
“at least”…
of which all are subsets of
the Unicode character set. So, the character set representable by
UTF-8 is a superset of the character sets representable by EUC-JP.
So why choose EUC-JP over UTF-8 then?
Conversion between a EUC-JP string to an equivalent UTF-8 string is a
beast as one has to use a conversion table.
Is it different with EUC-JP vs. iso-2022-jp? Well, parsing escape
sequences might be evil, too, but since there are only two of them…
I imagine what the effect of a closed transmission while in Multi-byte
mode is…
Or are these just different encodings for the same character set and
only the numbers are encoded differently?
The good thing is, I think
I’ve seen at least one ruby module that does such conversion, iconv,
which interfaces with the iconv*() functions found in modern unix. The
downside is, iconv may be a unix-only thingie.
Since I haven’t seen a web browser written in Ruby yet: why is that
a reason for the website to choose EUC-JP?
···
–
Nochn Hinweis: Ein Fragezeichen pro Satz reicht, mehr wirkt leicht
albern. Und davor bitte kein Leerzeichen machen. Hab ich “Warum” gehört
? Hier hast du die Antwort.
[Volker Gringmuth in de.newusers.questions]
Rudolf Polzer AntiATField_adsgohere@durchnull.de writes:
So why choose EUC-JP over UTF-8 then?
Familiarity, habit, and the lack of compelling incentives to
switch. Probably the user of any locale-specific encoding would
remember some commonly used values, just like people who use the ASCII
encoding would remember that ‘A’ is encoded as the number 65. If there
is a new ASCII-like standard, we (users of the ASCII encoding)
probably would also be as hesitant or inconvenienced to switch as
users of locale-specific encoding. Characteristic differences between
locale-specific character sets and the Unicode would also cause some
people to delay switching, e.g. an ASCII user would probably be
hesitant if the new ASCII-like standard does not conserve this
property: ‘9’-‘0’==9.
Is it different with EUC-JP vs. iso-2022-jp?
Or are these just different encodings for the same character set and
only the numbers are encoded differently?
Conversion between encodings that can be used to represent the same
character sets is rather painless as it can be achieved by simply
performing some linear transformations. I am not sure if this is
always true, but so far, my experience has not shown me otherwise. For
example, iso-2022-jp → EUC-JP == encoded value + 128, and EUC-JP →
iso-2022-jp is encoded value - 128 (I looked up the transformation
formulae from somewhere, as it is not something that I usually bother
to remember).
Since I haven’t seen a web browser written in Ruby yet: why is that
a reason for the website to choose EUC-JP?
Perhaps some author’s favourite tools (editor, fonts, information
processing tools (spell checker, search engines), etc.) are still non
UTF-8 enabled. But what I suspect most is “old habits are difficult to
break”.
YS.
Since I haven’t seen a web browser written in Ruby yet: why is that
a reason for the website to choose EUC-JP?
BTW, ruby accepts UTF-8 strings. Its regex can also operate on UTF-8
as well as it can on EUC-JP (since both are non-modal encodings).
YS.
Scripsit ille aut illa Yohanes Santoso ysantoso@jenny-gnome.dyndns.org:
Since I haven’t seen a web browser written in Ruby yet: why is that
a reason for the website to choose EUC-JP?
BTW, ruby accepts UTF-8 strings.
Which does every 8-bit clean language.
Its regex can also operate on UTF-8
as well as it can on EUC-JP (since both are non-modal encodings).
EUC-JP: I don’t think it’s a good idea if a hiragana “desu”
matches a “nin/tou” kanji (I know, that’s the most improbable
example I could find… - I don’t know the language, just the IME,
kakasi and the EDICT). But I don’t think such things are occuring often
since you wouldn’t use a single character in a RE match. But a Japanese
might already be used to such a “weird” regex behaviour from other
tools like grep, which might do the same. Or does grep handle character
sets correctly and doesn’t just apply regexes to byte arrays?
I think the main problem with Ruby is that Kconv cannot convert between
UTF-8 and the rest.
···
–
Your password must be at least 18770 characters and cannot repeat any of
your previous 30689 passwords. Please type a different password. Type a
password that meets these requirements in both text boxes. [M$]
(Fix: http://support.microsoft.com/support/kb/articles/q276/3/04.ASP)
Hi,
BTW, ruby accepts UTF-8 strings.
Which does every 8-bit clean language.
More than that. Ruby’s regex does handle characters so that the
problem you mentioned never happen in Ruby (if you specify proper
character encoding).
I think the main problem with Ruby is that Kconv cannot convert between
UTF-8 and the rest.
Kconv is based on the encoding converter developed before Unicode was
defined. But you have iconv and uconv modules.
By the way, I will tell the blade maintainer about EUC-JP issue.
matz.
···
In message “Re: euc-jp coding of ruby talk web interface” on 03/01/29, Rudolf Polzer AntiATField_adsgohere@durchnull.de writes: