Iconv problems with different machines

Hi,

I have the following piece of code:

ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
puts ic.iconv("Aüthor")

1. on my local machine (OSX 10.5) when I run this, I get the output:
A"uthor

2. when I run this same code on my debian server (via rake executed
through a capistrano task) I get the output: A?thor

3. when I run this same code on my debian server (via irb), I get:
Author

Both 1 and 3 are acceptable output to me, however I cant figure out how
to get my program to output the correct result on my server when I run
it through a capistrano task. Is there some environment variable I need
to set? From reading other posts, I've tried adding at the top of my
file:
$KCODE = "u"
require 'jcode'
ENV['LANG'] = 'en_US.UTF-8'
ENV['LC_CTYPE'] = 'en_US.UTF-8'

still doesn't fix the issue. Any help would be greatly appreciated.

Thanks,
Ray

···

--
Posted via http://www.ruby-forum.com/.

Actually I found some other posts about this same issue from awhile
ago... Appears there's no solution.

I stopped using the iconv library and instead switched to the iconv
system command and that seems to work. Not the best solution, but at
least it works....

···

--
Posted via http://www.ruby-forum.com/.

Raymond O'Connor said...

Hi,

I have the following piece of code:

ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
puts ic.iconv("Aüthor")

1. on my local machine (OSX 10.5) when I run this, I get the output:
A"uthor

2. when I run this same code on my debian server (via rake executed
through a capistrano task) I get the output: A?thor

3. when I run this same code on my debian server (via irb), I get:
Author

Both 1 and 3 are acceptable output to me, however I cant figure out how
to get my program to output the correct result on my server when I run
it through a capistrano task. Is there some environment variable I need
to set? From reading other posts, I've tried adding at the top of my
file:
$KCODE = "u"
require 'jcode'
ENV['LANG'] = 'en_US.UTF-8'
ENV['LC_CTYPE'] = 'en_US.UTF-8'

still doesn't fix the issue. Any help would be greatly appreciated.

I've found a lot of bugs with the MRI Iconv and now only use it with
JRuby - which, I suspect, uses the Java SE convertors.

···

--
Cheers,
Marc

I have not been able to understand where is exactly the difference, but looks like depending on the system/version/something the transliteration tables are just different. At ASPgems we wrote this hand-crafted normalizer which we know is portable for sure (note that it uses Rails #chars and does a bit more stuff, but you see the idea):

   def self.normalize(str)
     return '' if str.nil?
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[àáâãäåāă]/, 'a')
     n.gsub!(/æ/, 'ae')
     n.gsub!(/[ďđ]/, 'd')
     n.gsub!(/[çćčĉċ]/, 'c')
     n.gsub!(/[èéêëēęěĕė]/, 'e')
     n.gsub!(/ƒ/, 'f')
     n.gsub!(/[ĝğġģ]/, 'g')
     n.gsub!(/[ĥħ]/, 'h')
     n.gsub!(/[ììíîïīĩĭ]/, 'i')
     n.gsub!(/[įıijĵ]/, 'j')
     n.gsub!(/[ķĸ]/, 'k')
     n.gsub!(/[łľĺļŀ]/, 'l')
     n.gsub!(/[ñńňņʼnŋ]/, 'n')
     n.gsub!(/[òóôõöøōőŏŏ]/, 'o')
     n.gsub!(/œ/, 'oe')
     n.gsub!(/ą/, 'q')
     n.gsub!(/[ŕřŗ]/, 'r')
     n.gsub!(/[śšşŝș]/, 's')
     n.gsub!(/[ťţŧț]/, 't')
     n.gsub!(/[ùúûüūůűŭũų]/, 'u')
     n.gsub!(/ŵ/, 'w')
     n.gsub!(/[ýÿŷ]/, 'y')
     n.gsub!(/[žżź]/, 'z')
     n.gsub!(/\s+/, ' ')
     n.delete!('^ a-z0-9_/\\-')
     n
   end

-- fxn

···

On Dec 5, 2007, at 12:14 PM, Raymond O'Connor wrote:

Actually I found some other posts about this same issue from awhile
ago... Appears there's no solution.

I stopped using the iconv library and instead switched to the iconv
system command and that seems to work. Not the best solution, but at
least it works....

Hi Xavier,

I like that solution even better. Thanks for sharing!

Best,
Ray

···

--
Posted via http://www.ruby-forum.com/.