Saluton!
RFC 2047 is the MIME standard that describes how to use non-ascii
character sets in internet mail. This library takes the approach
that if you give it a string that might have RFC 2047 encoded words
in it, and tell it what character set you’d like the string to be
in, it will convert it (using iconv).
The library seems to work fine with western character sets but it has
problems with Japanese (and possibly other languages as well).
I did use Mew (written in Japan) to write a message with ISO-2022-jp
encode header. The subject did read ‘hiragana katakana jôyô-kanji’
where all words were written in the way they indicate (e.g. hiragana
in hiragana). Displaying the message header did leave my kanji
terminal (rxvt) in an unusable state - even issuing the ‘reset’
command didn’t help. I’ll send a message entitled: “RFC 2047 to
hiragana, katakana to jôyô-kanji” to the list so that you have
something for testing purposes and the Japanese among us can inform
us if my message happens to be broken in their mailer (I would be
surprised if that is the case).
Thats how life is at the b?leeding edge: You are b?leeding >;->
Gis,
Josef ‘Jupp’ Schugt
Thanks for the feedback. I’ll run your example (thanks) through iconv,
and see if I can figure out what is happening.
What characterset did you convert to? I assume it was the character set
that your terminal is using?
Sam
Quoteing jupp@gmx.de, on Wed, Apr 16, 2003 at 01:39:17AM +0900:
···
Saluton!
RFC 2047 is the MIME standard that describes how to use non-ascii
character sets in internet mail. This library takes the approach
that if you give it a string that might have RFC 2047 encoded words
in it, and tell it what character set you’d like the string to be
in, it will convert it (using iconv).
The library seems to work fine with western character sets but it has
problems with Japanese (and possibly other languages as well).
I did use Mew (written in Japan) to write a message with ISO-2022-jp
encode header. The subject did read ‘hiragana katakana jôyô-kanji’
where all words were written in the way they indicate (e.g. hiragana
in hiragana). Displaying the message header did leave my kanji
terminal (rxvt) in an unusable state - even issuing the ‘reset’
command didn’t help. I’ll send a message entitled: “RFC 2047 to
hiragana, katakana to jôyô-kanji” to the list so that you have
something for testing purposes and the Japanese among us can inform
us if my message happens to be broken in their mailer (I would be
surprised if that is the case).
Thats how life is at the b?leeding edge: You are b?leeding >;->
Gis,
Josef ‘Jupp’ Schugt
Saluton!
What characterset did you convert to? I assume it was the character
set that your terminal is using?
Correct. Terminal does use iso-2022-jp as native charset. It seems
not to be a terminal problem since I did use the same terminal to
create the message. More precisely I did first start a Kanji
terminal, then issue ‘export LC_CTYPE=ja_JA’ and start ‘emacs -nw’
in that terminal. In Emacs I did then start Mew and start message
composition. To enter the message I did use ‘japanese’ input mode
(C-\ ENTER japanese). All characters did display correctly while
being entered. I then locally did deliver that message. It did
display correctly in Mew. The problem seems to result form the word
‘Katakana’ written in Katakana - all other parts of Subject did
display as expected.
BTW: For practical reasons it would be very nice if you use state-of-
the-art quoting style. It is an highly uneconomical to if one has to
skip back and forth between answer and what is being answered. Dômo
arigatô gozaimasu.
Gis,
Josef ‘Jupp’ Schugt
(Nobu, this question relates to using iconv to convert from iso-2022-jp
to iso-2022-jp, which I would think would be an identity transform, but
appears not to be, perhaps you have some insight?)
The decoder is correctly pulling the base64 out of your example, it is
then decoding it to binary, correctly I assume, unless there is a bug
in ruby b64 code :-). However, Iconv.iconv() is dropping some of the
trailing bytes when converting from what is (claimed to be) iso-2022-jp
to is-2022-jp. Any idea what those characters are? Some sort of terminal
reset sequence, perhaps? Maybe iso-2022-jp is stateful, in some way?
I’m afraid I don’t know anything about iso-2022-jp, and I’m just
learning about iconv(), so I don’t know that I can help right now.
Sam
Input strings:
GyRCJEgbKEIgGyRCJFIkaSQsJEobKEI= (b64)
“\e$B$H\e(B \e$B$R$i$,$J\e(B” (binary, iso-2022-jp)
“\e$B$H\e(B \e$B$R$i$,$J” (Iconv.iconv from iso-2022-jp to iso-2022-jp)
The other strings all have the same behaviour, and the same final 2
bytes missing from the iconv output.
GyRCISIbKEIgGyRCJSslPyUrJUobKEI=
“\e$B!"\e(B \e$B%+%?%+%J\e(B”
“\e$B!"\e(B \e$B%+%?%+%J”
IBskQiRIGyhCIBskQj5vTVE0QRsoQg==
" \e$B$H\e(B \e$B>oMQ4A\e(B"
" \e$B$H\e(B \e$B>oMQ4A"
GyRCO3obKEI=
“\e$B;z\e(B”
“\e$B;z”
Quoteing jupp@gmx.de, on Thu, Apr 17, 2003 at 12:35:07AM +0900:
···
The library seems to work fine with western character sets but it has
problems with Japanese (and possibly other languages as well).
I did use Mew (written in Japan) to write a message with ISO-2022-jp
encode header. The subject did read ‘hiragana katakana jtyt-kanji’
where all words were written in the way they indicate (e.g. hiragana
in hiragana). Displaying the message header did leave my kanji
terminal (rxvt) in an unusable state - even issuing the ‘reset’
command didn’t help. I’ll send a message entitled: "RFC 2047 to