I'm finding a weird problem with iconv. I would like to convert from
UTF-8 West European strings (like 'pan carrè') to the ASCII//TRANSLIT ->
'pan carre'
my system is: Linux bacedifo 2.6.18-6-xen-686 #1 SMP Fri Dec 12 20:13:49
UTC 2008 i686 GNU/Linux
with
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
my script is:
"
require 'iconv'
$KCODE = 'UTF-8'
require 'jcode'
resume = 'résumé'
puts resume.inspect
puts "Internal iconv: #{Iconv.conv("ASCII//TRANSLIT", "UTF8", resume)}"
puts "System iconv: #{%x{echo "#{resume}" | iconv -f utf-8 -t
ascii//translit}.strip}"
"
Output is:
francesco@bacedifo:~/tmp$ ruby ruby-talk.rb
"résumé"
Internal iconv: r?sum? # wrong: should be: resume
System iconv: resume # correct
Some more weirdness: with irb it just works:
francesco@bacedifo:~/tmp$ irb -r iconv
irb(main):001:0> resume = 'résumé'
=> "r\303\251sum\303\251"
irb(main):002:0> Iconv.conv("ASCII//TRANSLIT", "UTF8", resume)
=> "resume"
With jruby fails in a different way:
francesco@bacedifo:~/tmp$ jruby ruby-talk.rb
"résumé"
Internal iconv: résumé # wrong
System iconv: rA(C)sumA(C) # wrong
Is it a emacs issue? Maybe, but vim does the same. It's more likely I'm
doing something wrong. Can you help me?
Thank you!
Francesco
I know how do you feel, your problem really sucks.
To create tests that complain with this kind of issue into emacs, when my
problem was with ISO-8859 strings, I wrote code inserting multi-byte chars
by octal insertion (Use quoted-insert - C-q followed by octal character
code - http://www.cs.cmu.edu/cgi-bin/info2www?(emacs)Inserting%20Text\)...
Example: to insert the ISO 8859-1 string:
física é linda
(that extends to "Physics is beautiful" into portuguese), I should use
*f\205sica \201 linda*
where \*** is in truth the multi-byte char.
With this approach I could test iconv from emacs edited file.
DOES NOT INSERT \***, this is just the representation that emacs display at
char position when you insert it from C-q combination.
I expect that this is useful to you, cause when I pass through the same
problem, I lose a lot of time searching for this solution.
···
On Wed, Jul 15, 2009 at 7:01 AM, Francesco Malvezzi < francesco.malvezzi@unimore.it> wrote:
I'm finding a weird problem with iconv. I would like to convert from
UTF-8 West European strings (like 'pan carrè') to the ASCII//TRANSLIT ->
'pan carre'
my system is: Linux bacedifo 2.6.18-6-xen-686 #1 SMP Fri Dec 12 20:13:49
UTC 2008 i686 GNU/Linux
with
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
my script is:
"
require 'iconv'
$KCODE = 'UTF-8'
require 'jcode'
resume = 'résumé'
puts resume.inspect
puts "Internal iconv: #{Iconv.conv("ASCII//TRANSLIT", "UTF8", resume)}"
puts "System iconv: #{%x{echo "#{resume}" | iconv -f utf-8 -t
ascii//translit}.strip}"
"
Output is:
francesco@bacedifo:~/tmp$ ruby ruby-talk.rb
"résumé"
Internal iconv: r?sum? # wrong: should be: resume
System iconv: resume # correct
Some more weirdness: with irb it just works:
francesco@bacedifo:~/tmp$ irb -r iconv
irb(main):001:0> resume = 'résumé'
=> "r\303\251sum\303\251"
irb(main):002:0> Iconv.conv("ASCII//TRANSLIT", "UTF8", resume)
=> "resume"
With jruby fails in a different way:
francesco@bacedifo:~/tmp$ jruby ruby-talk.rb
"résumé"
Internal iconv: résumé # wrong
System iconv: rA(C)sumA(C) # wrong
Is it a emacs issue? Maybe, but vim does the same. It's more likely I'm
doing something wrong. Can you help me?
Thank you!
Francesco
--
Everton J. Carpes
Mobile: +55 53 9129.4593
MSN: maskejc@gmail.com
UIN: 343716195
Jabber: everton.carpes@jabber.org
Twitter: http://twitter.com/everton_carpes
Blog: http://www.geek.com.br/blogs/832697633-My-Way
Feeds: https://www.google.com/reader/shared/05200358440305782625
"If art interprets our dreams, the computer executes them in the guise of
programs!" - Alan J. Perlis