Hello
I have a lot of xml and java files witch have German Umlauts and other
non ASCII files in them.
I want to read the files and convert them to UTF-8 using a Ruby script.
I convert the strings with this code:
def to_utf8(str)
str.unpack('U*').map do |c|
if c < 0x80
c.chr
else
'( u%04X )' % c
end
end.join
end
(taken from "The Ruby Way" by Hal Fulton).
sometimes it works, sometimes I get this error:
"malformed UTF-8 character"
I tought this might happen because the File is encoded in ISO-8859-1
(was written with Eclipse set to ISO-8859-1 for text encoding).
how can I read a file with Ruby and specify that it is read with
ISO-8859-1 encoding (similar to Java's BufferedReader where I can
specify the encoding).
any help welcome. best wishes
Claus
···
--
Posted via http://www.ruby-forum.com/.
Claus Hausberger wrote:
Hello
I have a lot of xml and java files witch have German Umlauts and other
non ASCII files in them.
I want to read the files and convert them to UTF-8 using a Ruby script.
I convert the strings with this code:
def to_utf8(str)
str.unpack('U*').map do |c|
I'd be surprised if this was right - you're telling it that you're expecting the string to be UTF-8 already with that unpack format.
<snip>
how can I read a file with Ruby and specify that it is read with
ISO-8859-1 encoding (similar to Java's BufferedReader where I can
specify the encoding).
Investigate Iconv in the standard library. It does what you need.
···
--
Alex
Hallo Claus,
you could use jcode...
$KCODE = 'UTF8'
require 'jcode'
Cheers,
Enrique Comba Riepenhausen
···
On 14 May 2007, at 16:39, Claus Hausberger wrote:
Hello
I have a lot of xml and java files witch have German Umlauts and other
non ASCII files in them.
I want to read the files and convert them to UTF-8 using a Ruby script.
I convert the strings with this code:
def to_utf8(str)
str.unpack('U*').map do |c|
if c < 0x80
c.chr
else
'( u%04X )' % c
end
end.join
end
(taken from "The Ruby Way" by Hal Fulton).
sometimes it works, sometimes I get this error:
"malformed UTF-8 character"
I tought this might happen because the File is encoded in ISO-8859-1
(was written with Eclipse set to ISO-8859-1 for text encoding).
how can I read a file with Ruby and specify that it is read with
ISO-8859-1 encoding (similar to Java's BufferedReader where I can
specify the encoding).
any help welcome. best wishes
Claus
--
Posted via http://www.ruby-forum.com/\.