Pen Ttt wrote:
in my computer(ubuntu9.1+ruby1.9):
pt@pt-laptop:~$ irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s
=> "[\"��˵\"]"
in my friend's(ubuntu9.1+ruby1.9):
$ irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s
=> "\316\322\313\265"
irb(main):003:0> puts Iconv.iconv('UTF-8', 'GBK', str).to_s
我说
=> nil
what's wrong in my system?
One of the joys of ruby 1.9 is that the same program run on two
different machines can behave differently. That's even if the two
machines have identical versions of ruby and OS *and* you are feeding in
the same input data.
My advice is to stick with ruby 1.8.x, where the behaviour is both sane
and predictable. However there are other people who will vociferously
tell you that I am doing the entire ruby community a disservice by
recommending this to you. It's up to you whose advice to follow.
If you want to persevere with ruby 1.9, I suggest the following:
* Check you have exactly identical versions of 1.9 (check the
RUBY_DESCRIPTION constant) on both machines. The behaviour is subtle,
and a lot of it has changed.
* Look at str.bytes.to_a to see if the byte sequence is correct or not.
That is, the fact that irb displays the string wrongly or rightly
doesn't mean anything; don't trust what you see.
* Instead of using irb, write a .rb script, and run it from the command
line directly.
* Check the environments are the same on both. You could try
experimenting with setting LANG and/or LC_ALL environment variables
before starting ruby.
* I tried to understand how this all works, and I documented what I
found at string19/string19.rb at master · candlerb/string19 · GitHub
There are about 200 cases of encoding behaviour described there.
Also, it's possible to do what you're trying to do in ruby 1.9 without
using Iconv, but instead tagging str with its correct encoding, and then
using encode! to convert it to another. Whether it appears correctly on
the terminal or not, especially within irb, is still not something to
trust. Again, use str.bytes.to_a to see if it is the expected sequence
of bytes in the new encoding.
Good luck,
Brian.
···
--
Posted via http://www.ruby-forum.com/\.