Mac OS Roma to UTF-8 (Kconv | Iconv]

i've a small test using rubyaeosa-0.2.3, works great if i let the output
"as is".

In that case, according to SubEthaEdit (a MacOS X editor) the output
string is encoded in MacOS Roman.

Because i'll use the output in an xml form i need to translate to UTF-8.

if i make use of Kconv#toutf8 i get japanese (presumably) chars :

René (true char "é" output "as is" = MacOS Roman)
Ren (if i use Kconv#toutf8)
Anaïs (true char "é" output "as is" = MacOS Roman)
"A"n"a<ï and s replaced by a "japanese" char> (use of Kconv#toutf8)

also if i make use of Iconv.new('MACROMAN', 'UTF-8').iconv(str)

i get an error message :
AddressBook2vCardXml.rb:32:in `iconv': "\216" (Iconv::IllegalSequence)

for the first accentuated string (the "é" of René).

here is my script :
<code>
require 'osx/aeosa'
require 'kconv'
require 'iconv'

def album_list
  result = OSX.do_osascript %{
        tell application "Address Book"
          set a to first name of people
          set b to last name of people
         {a,b}
        end tell
      }
  firstName = result[0].map {|i| i.to_rbobj }
  lastName = result[1].map {|i| i.to_rbobj }
  return firstName.map {|i| [ i,lastName.shift ] }
end

aFile = File.new("AddressBook.xml", "w")
album_list.each do |f,l|
  aFile.puts "#{f} #{l}" // output "as is"
# aFile.puts "#{f.toutf8} #{l.toutf8}" // use Kconv#toutf8
# fu = Iconv.new('MACROMAN', 'UTF-8').iconv(f) // use of Iconv
# lu = Iconv.new('MACROMAN', 'UTF-8').iconv(l) // use of Iconv
# aFile.puts "#{fu} #{lu}" // use of Iconv
end
</code>

notice also that, if i do the encoding conversion using command line by
:

iconv -f MACROMAN -t UTF-8 AddressBook.xml > AddressBook-UTF-8.xml

"AddressBook.xml" being the output of my Ruby script, i get
"AddressBook-UTF-8.xml" correctly encoded !!!

may be that's the only solution for the time being ?

···

--
une bévue

Because i'll use the output in an xml form i need to translate to UTF-8.

Not necessarily. Just make sure that the encoding is specified in your
XML prolog:

<?xml version="1.0" encoding="Shift_JIS" ?>

also if i make use of Iconv.new('MACROMAN', 'UTF-8').iconv(str)

i get an error message :
AddressBook2vCardXml.rb:32:in `iconv': "\216" (Iconv::IllegalSequence)

for the first accentuated string (the "é" of René).

That's because the parameters are in the wrong order. They should be
given as (to, from). Your example is therefore trying to convert
*from* UTF-8 *to* Mac Roman, which is why the é is illegal.

Try this instead:

utf8_str = Iconv.new('UTF-8', 'MacRoman').iconv(mac_str)

e.g.
$KCODE = 'u'
require 'iconv'
Iconv.new('UTF-8', 'MacRoman').iconv("Ren\216") # => "René" [in UTF-8]

Paul.

pere.noel@laponie.com (Une bévue) writes:

also if i make use of Iconv.new('MACROMAN', 'UTF-8').iconv(str)

You got the encodings in the wrong order here. It's TO, FROM.

···

une bévue

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

ok, thanks very much, that's working right nox !!!

···

Paul Battley <pbattley@gmail.com> wrote:

That's because the parameters are in the wrong order.

--
une bévue

you're right thanks )))

···

Christian Neukirchen <chneukirchen@gmail.com> wrote:

You got the encodings in the wrong order here. It's TO, FROM.

--
une bévue