Cann't require UTF-8 files

O01eg_Oleg · 30 April 2010 17:26

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"привет"
NoMethodError: undefined method `u' for main:Object

···

--
Posted via http://www.ruby-forum.com/.

Caleb_Clausen1 · 30 April 2010 18:16

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

···

On 4/30/10, O01eg Oleg <o01eg@yandex.ru> wrote:

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"привет"
NoMethodError: undefined method `u' for main:Object

O01eg_Oleg · 30 April 2010 18:19

Caleb Clausen wrote:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Thanks, it work.

···

--
Posted via http://www.ruby-forum.com/\.

Fernando_Perez · 14 February 2011 08:30

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

···

--
Posted via http://www.ruby-forum.com/\.

Josh_Cheek · 14 February 2011 08:48

Run with -Ku flag.

gist.github.com

https://gist.github.com/JoshCheek/825626

main.rb

#!/usr/bin/env ruby -Ku

require File.dirname(__FILE__) + "/other"

other.rb

puts "1 ≤ 3"

···

On Mon, Feb 14, 2011 at 2:30 AM, Fernando Perez <pedrolito@lavache.com>wrote:

> Are you using ruby 1.9? If so, then you need to add a magic encoding
> line as the first line (or second if the first is a shebang line) of
> your source file, like this:
> # encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

--
Posted via http://www.ruby-forum.com/\.

Gary_Wright · 14 February 2011 22:04

If the encoding declaration isn't in the file itself then where exactly would you store it? If it isn't in the file then it has to be in some OS or filesystem specific meta-data store or in yet another file. All of which increases the likelihood that the file and its meta-data will get out of synch or won't stay together when the file is copied or transferred somewhere else.

Placing the encoding information in the file itself seems like the most practical solution. The encoding declaration could of course be incorrect, but that is always a possibility no matter where you store the info.

Gary Wright

···

On Feb 14, 2011, at 3:30 AM, Fernando Perez wrote:

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

Inaki_Baz_Castillo · 15 February 2011 08:47

If it's metadata, why are you using "require 'file'" instead of
"File.read('file.rb')"?

···

2011/2/14 Fernando Perez <pedrolito@lavache.com>:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

--
Iñaki Baz Castillo
<ibc@aliax.net>

Eric_Hodel1 · 14 February 2011 23:08

This is not a good solution for library code.

···

On Feb 14, 2011, at 12:48 AM, Josh Cheek wrote:

On Mon, Feb 14, 2011 at 2:30 AM, Fernando Perez <pedrolito@lavache.com>wrote:

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

Run with -Ku flag.

Clifford_Heath5 · 15 February 2011 02:05

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

···

On 02/15/11 10:08, Eric Hodel wrote:

On Feb 14, 2011, at 12:48 AM, Josh Cheek wrote:

On Mon, Feb 14, 2011 at 2:30 AM, Fernando Perez<pedrolito@lavache.com>wrote:

# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?
That's really a metadata and does not belong to the source code.

Run with -Ku flag.

This is not a good solution for library code.

Phil · 15 February 2011 05:16

The use of a byte order mark is optional. Bit hard to detect what
isn't there, is it?

Here's a (short) discussion on auto-detecting Unicode:
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

···

On Tue, Feb 15, 2011 at 3:05 AM, Clifford Heath <no@spam.please.net> wrote:

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Jeremy_Bopp · 15 February 2011 15:42

Using a BOM would break shebang processing. It's not a problem for
Windows users of Ruby since the shebang line is ignored there, but it
would break things for all Unix-like platforms (including Cygwin) where
a script can be run directly as a program:

My personal preference would be for a single multi-byte encoding to be
selected for all Ruby files. This would make it easier to configure an
editor or source visualizer to handle a file appropriately without the
need to replicate Ruby's encoding detection. One downside though is
that existing scripts encoded differently may be broken for this
hypothetical Ruby's consumption.

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

···

On 2/14/2011 8:05 PM, Clifford Heath wrote:

Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

Florian_Gilcher · 15 February 2011 18:06

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a heavyweight
solution like i18n or just a plain YAML file, if you don't want a dependency).
This also circumvents the problem of headers and is good practice.
For scripts of smaller scope, I usually skip that rule ;).[2]

Ruby still assumes source code to be US-ASCII by default, which I think is a good
choice for compatibility reasons.[1]

Regards,
Florian

[1] Which is also the assumption that Ruby 1.8 had, but not as explicit.
[2] A neat trick is the following:

require "yaml"
puts YAML.load(DATA).inspect

__END__

···

On Feb 15, 2011, at 4:42 PM, Jeremy Bopp wrote:

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

---
:test: Some ünicode Data.

Fernando_Perez · 17 February 2011 08:31

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don't want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.

···

--
Posted via http://www.ruby-forum.com/\.

Florian_Gilcher · 17 February 2011 12:47

I think at least the "unreadable" part is debatable. Autocompletion might
be handy, but the features of your editor should not factor into the organization
of your code.

Also, ERB templates are #read, which takes the external-encoding setting
into account and then evaluated using #eval, which does take the encoding
of the string into account. Other templating libraries like haml have a
setting for the default template encoding. So templates are not
really the problem, as you can already use utf-8 pretty freely without marking
it.

Regards,
Florian

···

On Feb 17, 2011, at 9:31 AM, Fernando Perez wrote:

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don't want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.

Topic		Replies	Views
Ruby 1.9 # coding: utf-8 ruby-talk	5	158	27 March 2009
Require fails when requiring scripts with utf-8 filenames ruby-talk	4	131	13 June 2010
R1.9 mixed encoding in file ruby-talk	11	181	8 August 2009
Ruby 1.9 - US-ASCII vs UTF-8 ruby-talk	2	161	19 December 2009
[ruby 1.9] reading an UTF-8 encoded file ruby-talk	12	208	11 March 2010

Cann't require UTF-8 files

Related topics