I found several issues in string encoding. Here is the problem:
[root@mars mysql]# irb -E ascii
# I start irb with default external encoding ascii
irb(main):014:0> String.new.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):015:0> "".encoding
=> #<Encoding:US-ASCII>
# I get different encodings when I initialize an empty string. Why?
irb(main):023:0> "\x80".encoding
=> #<Encoding:ASCII-8BIT>
irb(main):024:0> "\x7F".encoding
=> #<Encoding:US-ASCII>
# It looks that if there is a ASCII value greater than 0x7F, it will use
ASCII-8BIT encoding. That is OK.
irb(main):005:0> new_str = "\xF1\xF2"
=> "\xF1\xF2"
irb(main):006:0> new_str.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):007:0> s ="%c%c%c%c%c%s" % [49, 5, 245, 225, 1, new_str]
Encoding::CompatibilityError: incompatible character encodings: US-ASCII
and ASCII-8BIT
from (irb):7:in `%'
from (irb):7
from /bin/irb:12:in `<main>'
# Now I try to use a ASCII-8BIT to format another string, it raises
exception. Why?
irb(main):008:0> s ="%c%c%c%c%c%s" % [49, 5, 45, 25, 1, new_str]
=> "1\x05-\x19\x01\xF1\xF2"
# I am very surprise that if I don't use value > 0x7F to format, it can
handle it.
irb(main):012:0> s ="%c%c%c%c%c" % [49, 5, 245, 225, 1]
=> "1\x05\xF5\xE1\x01"
irb(main):013:0> s.encoding
=> #<Encoding:US-ASCII>
# If I don't put the ASCII-8BIT string to format, it also works. But I
am very surprise that even there is a non-ASCII char inside the string,
the encoding is US-ASCII. Why?
···
--
Posted via http://www.ruby-forum.com/.