Hello
The Iconv class (inside the stdlib iconv package, rdoc:
http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html)
docs lead one to believe that Iconv.conv is equivalent. As I'm going
to demonstrate, this is not the case. Furthermore, the behavior of
both Iconv.iconv and Iconv#iconv seems to be strange, surprising, and
quite possibly also buggy.
Here is an IRB demonstration of this ( ruby 1.8.2 (2005-04-11) [i386-linux] ):
irb(main):007:0> Iconv.conv('utf-8', 'windows-1255', "\xe0")
"\327\220" # appropriate output for the given input
irb(main):008:0> Iconv.conv('utf-8', 'windows-1255', "\xe0\xe1")
"\327\220\327\221" # appropriate output for the given input
irb(main):009:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0")
["", "\327\220"] # strange output. why an array? why the
empty-string first element?
irb(main):010:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1")
["\327\220", "\327\221"] # again, why an array? and why split the string?
irb(main):011:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1\xe2")
["\327\220\327\221", "\327\222"]
irb(main):012:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1\xe2\xe3")
["\327\220\327\221\327\222", "\327\223"]
irb(main):016:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0")
"" # last character of the string dropped
irb(main):017:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0\xe1")
"\327\220"
irb(main):018:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0\xe1\xe2")
"\327\220\327\221"
irb(main):019:0> Iconv.new('utf-8', 'windows-1255').iconv("\xe0\xe1\xe2\xe3")
"\327\220\327\221\327\222"
Adding a newline char at the end of the converted text solves the
problem with Iconv#iconv, and partly solves the problem with
Iconv.conv
irb(main):020:0> Iconv.iconv('utf-8', 'windows-1255', "\xe0\xe1\xe2\xe3" + "\n")
["\327\220\327\221\327\222\327\223\n"] # still an array, but at
least it contains the correct, non-split conversion.
irb(main):021:0> Iconv.new('utf-8',
'windows-1255').iconv("\xe0\xe1\xe2\xe3" + "\n"\
)
"\327\220\327\221\327\222\327\223\n" # correct output, including the
last character.
Hope that helps,
Tirkal