Saving an UTF-8 file

Hi All

I have a problem (newbie problem).

I don't know how to write a file using utf-8 encoding. Can you help
me.

Thanks in advance

Kind regards

···

--

Miquel (a.k.a. Ton)
Linux User #286784
GPG Key : 4D91EF7F
Debian GNU/Linux (Linux Wolverine 2.6.14)

Welcome to the jungle, we got fun and games
Guns n' Roses

______________________________________________
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com

Miquel Oliete wrote:

Hi All

I have a problem (newbie problem).

I don't know how to write a file using utf-8 encoding. Can you help
me.

utf-8 is simply 8-bit bytes. Save your data like this:

# "data" contain the text data

File.open(file_path,"w") { |f| f.write data }

utf-8 refers to a convention regarding the content of the bytes and how they
are interpreted when read. It isn't something you can specify in a
plain-text file. It can be inferred from the format of the bytes, but that
is an open interpretation.

http://dict.die.net/utf-8/

···

--
Paul Lutus
http://www.arachnoid.com

Well, how are you storing the Unicode characters are you using
internally? If your Unicode string within Ruby is stored as an array
of ints, then

File.open("output_file.utf8") do |fp|
  fp.puts(data.pack("U*"))
end

should be sufficient. If you have a Ruby string that uses some other
encoding (e.g. ISO-8859-1), then you must use the iconv library to
convert the string to UTF-8:

require 'iconv'

cd = Iconv.new('utf-8', 'iso-8859-1')
File.open("output_file.utf8") do |fp|
  fp.puts(cd.iconv(data))
end

When you do i18n, l10n, and m17n, strings become meaningless unless
they have an attached encoding.

···

On 11/12/06, Miquel Oliete <ktalanet@yahoo.es> wrote:

Hi All

I have a problem (newbie problem).

I don't know how to write a file using utf-8 encoding. Can you help
me.

Paul Lutus wrote:

It isn't something you can specify in a
plain-text file.

Byte order mark?

A specification it is not, but generally a good hint. There are gotchas
though if you process it with software that's not Unicode-unaware.

David Vallner

Not meaningful in UTF-8, since it's all a defined series of bytes
(it's always the same order on all platforms).

-austin

···

On 11/12/06, David Vallner <david@vallner.net> wrote:

Paul Lutus wrote:
> It isn't something you can specify in a
> plain-text file.
Byte order mark?

--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
               * austin@zieglers.ca

Austin Ziegler wrote:

···

On 11/12/06, David Vallner <david@vallner.net> wrote:

Paul Lutus wrote:
> It isn't something you can specify in a
> plain-text file.
Byte order mark?

Not meaningful in UTF-8, since it's all a defined series of bytes
(it's always the same order on all platforms).

-austin

Yes, but it can be used as a "this file is UTF-8" marker by convention.
And cause problems in software that doesn't recognize the convention,
for added hilarity.

David Vallner

It's a bad convention, because it adds meaningless bytes to the
beginning of a file. I'm not saying that an unadorned document is
better, but better to do something that has actual meaning than doing
a pointless BOM.

-austin

···

On 11/12/06, David Vallner <david@vallner.net> wrote:

Austin Ziegler wrote:
> On 11/12/06, David Vallner <david@vallner.net> wrote:
>> Paul Lutus wrote:
>> > It isn't something you can specify in a
>> > plain-text file.
>> Byte order mark?
> Not meaningful in UTF-8, since it's all a defined series of bytes
> (it's always the same order on all platforms).
Yes, but it can be used as a "this file is UTF-8" marker by convention.
And cause problems in software that doesn't recognize the convention,
for added hilarity.

--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
               * austin@zieglers.ca