File.new and encoding

Achim_Domma_SyynX_So · 29 November 2005 15:07

Hi,

I'm still quite new to ruby, but have written a simple code generator. The generator opens some files and combines them to a new one. The resulting file is encoded as iso-8859-1, but it looks like ruby writes an UTF-8 Markter to the beginning of the file. Is that possible?

How can I tell ruby which encoding to use, if I write to textfiles?

Any pointers to documentation are wellcome, but I didn't find something usefull using google.

regards,
Achim

Robert · 29 November 2005 15:17

Achim Domma (SyynX Solutions GmbH) wrote:

Hi,

I'm still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?

What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?

How can I tell ruby which encoding to use, if I write to textfiles?

Any pointers to documentation are wellcome, but I didn't find
something usefull using google.

Encoding is not an easy issue with ruby - I guess by default it uses the
default enconding of your environment. But you can specify certain
(Japanese) encodings with command line option -K. HTH

Kind regards

robert

Nobuyoshi_Nakada1 · 29 November 2005 15:35

Hi,

At Wed, 30 Nov 2005 00:17:29 +0900,
Robert Klemme wrote in [ruby-talk:167988]:

> I'm still quite new to ruby, but have written a simple code generator.
> The generator opens some files and combines them to a new one. The
> resulting file is encoded as iso-8859-1, but it looks like ruby writes
> an UTF-8 Markter to the beginning of the file. Is that possible?

What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?

It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.

> How can I tell ruby which encoding to use, if I write to textfiles?

Can't you show the code?

···

--
Nobu Nakada

Achim_Domma_SyynX_So · 29 November 2005 18:52

nobu@ruby-lang.org wrote:

It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.

[...]

Can't you show the code?

Trying to reproduce the problem in a smaller example, I figured out, that I'm reading the BOM from one of my source files. Sorry for the confusion. I'm doing something like:

source seems to contain the BOM and it is writen to target. Any hint on how to strip the BOM?

regards,
Achim

Alex_Fenton2 · 29 November 2005 19:32

I'm doing something like:

File.open("target","w") do |target|
    File.open("source","r") do |source|
        source.each_line do |line|
            ... some processing ...
            target.write(line)
        end
     end
end

Have you looked at 'iconv' in the standard library?

http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html

Assuming all your input files were ISO-8859-1, and you wanted your output file in UTF-8, your example might look something like (untested):

Iconv should deal with BOMs, stripping them out or adding them in where necessary. I'm not sure if it will complain if it finds a BOM mid-stream (as you open your second and subsequent input file) - if so you could just instantiate a new Iconv to deal with each input.

HTH
alex

Topic		Replies	Views
Ruby 1.9 - US-ASCII vs UTF-8 ruby-talk	2	150	19 December 2009
How to create a file with UTF-8 encoding ruby-talk	4	136	21 September 2009
Saving an UTF-8 file ruby-talk	6	134	12 November 2006
Ruby 1.9.2 UTF-8 Encoding issues whiles reading/writing files ruby-talk	2	141	18 November 2010
Reading Files: how to I specify the encoding? ruby-talk	2	119	14 May 2007

File.new and encoding

Related topics