I'm still quite new to ruby, but have written a simple code generator. The generator opens some files and combines them to a new one. The resulting file is encoded as iso-8859-1, but it looks like ruby writes an UTF-8 Markter to the beginning of the file. Is that possible?
How can I tell ruby which encoding to use, if I write to textfiles?
Any pointers to documentation are wellcome, but I didn't find something usefull using google.
I'm still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?
What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?
How can I tell ruby which encoding to use, if I write to textfiles?
Any pointers to documentation are wellcome, but I didn't find
something usefull using google.
Encoding is not an easy issue with ruby - I guess by default it uses the
default enconding of your environment. But you can specify certain
(Japanese) encodings with command line option -K. HTH
At Wed, 30 Nov 2005 00:17:29 +0900,
Robert Klemme wrote in [ruby-talk:167988]:
> I'm still quite new to ruby, but have written a simple code generator.
> The generator opens some files and combines them to a new one. The
> resulting file is encoded as iso-8859-1, but it looks like ruby writes
> an UTF-8 Markter to the beginning of the file. Is that possible?
What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?
It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.
> How can I tell ruby which encoding to use, if I write to textfiles?
It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.
[...]
Can't you show the code?
Trying to reproduce the problem in a smaller example, I figured out, that I'm reading the BOM from one of my source files. Sorry for the confusion. I'm doing something like:
File.open("target","w") do |target|
File.open("source","r") do |source|
source.each_line do |line|
... some processing ...
target.write(line)
end
end
end
source seems to contain the BOM and it is writen to target. Any hint on how to strip the BOM?
File.open("target","w") do |target|
File.open("source","r") do |source|
source.each_line do |line|
... some processing ...
target.write(line)
end
end
end
Have you looked at 'iconv' in the standard library?
Assuming all your input files were ISO-8859-1, and you wanted your output file in UTF-8, your example might look something like (untested):
File.open("target","w") do |target|
Iconv.open('UTF-8', 'ISO-8859-1') do | converter |
File.open("source","r") do |source|
source.each_line do |line|
# ... some processing ...
target.write( converter.iconv(line) )
end
end
target << converter.iconv(nil)
end
end
Iconv should deal with BOMs, stripping them out or adding them in where necessary. I'm not sure if it will complain if it finds a BOM mid-stream (as you open your second and subsequent input file) - if so you could just instantiate a new Iconv to deal with each input.