Cann't require UTF-8 files

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"привет"
NoMethodError: undefined method `u' for main:Object

···

--
Posted via http://www.ruby-forum.com/.

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

···

On 4/30/10, O01eg Oleg <o01eg@yandex.ru> wrote:

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"привет"
NoMethodError: undefined method `u' for main:Object

Caleb Clausen wrote:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Thanks, it work.

···

--
Posted via http://www.ruby-forum.com/\.

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

···

--
Posted via http://www.ruby-forum.com/\.

Run with -Ku flag.

···

On Mon, Feb 14, 2011 at 2:30 AM, Fernando Perez <pedrolito@lavache.com>wrote:

> Are you using ruby 1.9? If so, then you need to add a magic encoding
> line as the first line (or second if the first is a shebang line) of
> your source file, like this:
> # encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

--
Posted via http://www.ruby-forum.com/\.

If the encoding declaration isn't in the file itself then where exactly would you store it? If it isn't in the file then it has to be in some OS or filesystem specific meta-data store or in yet another file. All of which increases the likelihood that the file and its meta-data will get out of synch or won't stay together when the file is copied or transferred somewhere else.

Placing the encoding information in the file itself seems like the most practical solution. The encoding declaration could of course be incorrect, but that is always a possibility no matter where you store the info.

Gary Wright

···

On Feb 14, 2011, at 3:30 AM, Fernando Perez wrote:

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

If it's metadata, why are you using "require 'file'" instead of
"File.read('file.rb')"?

···

2011/2/14 Fernando Perez <pedrolito@lavache.com>:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

--
Iñaki Baz Castillo
<ibc@aliax.net>

This is not a good solution for library code.

···

On Feb 14, 2011, at 12:48 AM, Josh Cheek wrote:

On Mon, Feb 14, 2011 at 2:30 AM, Fernando Perez <pedrolito@lavache.com>wrote:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

Run with -Ku flag.

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

···

On 02/15/11 10:08, Eric Hodel wrote:

On Feb 14, 2011, at 12:48 AM, Josh Cheek wrote:

On Mon, Feb 14, 2011 at 2:30 AM, Fernando Perez<pedrolito@lavache.com>wrote:

# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?
That's really a metadata and does not belong to the source code.

Run with -Ku flag.

This is not a good solution for library code.

The use of a byte order mark is optional. Bit hard to detect what
isn't there, is it?

Here's a (short) discussion on auto-detecting Unicode:
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

···

On Tue, Feb 15, 2011 at 3:05 AM, Clifford Heath <no@spam.please.net> wrote:

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Using a BOM would break shebang processing. It's not a problem for
Windows users of Ruby since the shebang line is ignored there, but it
would break things for all Unix-like platforms (including Cygwin) where
a script can be run directly as a program:

My personal preference would be for a single multi-byte encoding to be
selected for all Ruby files. This would make it easier to configure an
editor or source visualizer to handle a file appropriately without the
need to replicate Ruby's encoding detection. One downside though is
that existing scripts encoded differently may be broken for this
hypothetical Ruby's consumption.

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

···

On 2/14/2011 8:05 PM, Clifford Heath wrote:

Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a heavyweight
solution like i18n or just a plain YAML file, if you don't want a dependency).
This also circumvents the problem of headers and is good practice.
For scripts of smaller scope, I usually skip that rule ;).[2]

Ruby still assumes source code to be US-ASCII by default, which I think is a good
choice for compatibility reasons.[1]

Regards,
Florian

[1] Which is also the assumption that Ruby 1.8 had, but not as explicit.
[2] A neat trick is the following:

  require "yaml"
  puts YAML.load(DATA).inspect

  __END__

···

On Feb 15, 2011, at 4:42 PM, Jeremy Bopp wrote:

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

  ---
  :test: Some ünicode Data.

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don't want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.

···

--
Posted via http://www.ruby-forum.com/\.

I think at least the "unreadable" part is debatable. Autocompletion might
be handy, but the features of your editor should not factor into the organization
of your code.

Also, ERB templates are #read, which takes the external-encoding setting
into account and then evaluated using #eval, which does take the encoding
of the string into account. Other templating libraries like haml have a
setting for the default template encoding. So templates are not
really the problem, as you can already use utf-8 pretty freely without marking
it.

Regards,
Florian

···

On Feb 17, 2011, at 9:31 AM, Fernando Perez wrote:

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don't want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.