Multibyte and Gems

I've tracked down a problem with a Gem I am trying to use. It turns out that it has some non-ascii characters in it; for example the second quote in the regular expression below is not an ASCII character:

     parts = self.split( %r/( [:.;?!][ ] | (?:[ ]|^)["“] )/x )

It produces errors like this:

  :in `require': /opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/lib/webby/core_ext/string.rb:14: invalid multibyte char (US-ASCII) (SyntaxError)

I fixed it by adding the following to the top of the offending file:

  # encoding: utf-8

My questions:

* Is this the preferred fix?

* Is there a way to work around this problem without modifying the Gem?

* Is there an easy way to see if gems have non-ascii source files but haven't included an encoding comment? Some kind of Ruby warning for instance.

I've tracked down a problem with a Gem I am trying to use. It turns out that it has some non-ascii characters in it; for example the second quote in the regular expression below is not an ASCII character:

   parts = self.split( %r/( [:.;?!] | (?:|^)["“] )/x )

It produces errors like this:

  :in `require': /opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/lib/webby/core_ext/string.rb:14: invalid multibyte char (US-ASCII) (SyntaxError)

I fixed it by adding the following to the top of the offending file:

  # encoding: utf-8

My questions:

* Is this the preferred fix?

Yes.

* Is there a way to work around this problem without modifying the Gem?

File a bug with the author and have them release a new version, otherwise no.

* Is there an easy way to see if gems have non-ascii source files but haven't included an encoding comment? Some kind of Ruby warning for instance.

ruby -c will do this for you.

···

On Jun 22, 2009, at 22:26, Martin Hess wrote:

So is it considered best practice to put an encoding comment at the begging of all your files now days? Such as:

   # encoding: utf-8

or whatever encoding you like. is this what people are doing or are they doing it one off for the files that have non-ascii characters?

It seems to me that if you have a modern editor it isn't too hard to accidentally slip in some non-ascii characters resulting in some pain down the road.

···

On Jun 23, 2009, at 12:25 AM, Eric Hodel wrote:

On Jun 22, 2009, at 22:26, Martin Hess wrote:

I've tracked down a problem with a Gem I am trying to use. It turns out that it has some non-ascii characters in it; for example the second quote in the regular expression below is not an ASCII character:

  parts = self.split( %r/( [:.;?!] | (?:|^)["“] )/x )

It produces errors like this:

  :in `require': /opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/lib/webby/core_ext/string.rb:14: invalid multibyte char (US-ASCII) (SyntaxError)

I fixed it by adding the following to the top of the offending file:

  # encoding: utf-8

My questions:

* Is this the preferred fix?

Yes.

* Is there a way to work around this problem without modifying the Gem?

File a bug with the author and have them release a new version, otherwise no.

* Is there an easy way to see if gems have non-ascii source files but haven't included an encoding comment? Some kind of Ruby warning for instance.

ruby -c will do this for you.

So is it considered best practice to put an encoding comment at the begging
of all your files now days? Such as:

    \# encoding: utf\-8

or whatever encoding you like. is this what people are doing or are they
doing it one off for the files that have non-ascii characters?

It seems to me that if you have a modern editor it isn't too hard to
accidentally slip in some non-ascii characters resulting in some pain down
the road.

It isn't hard to mess up any code in a lot of ways, so as usual, try
to run/test it before you release/deploy :slight_smile:
That also means that using Ruby 1.9.1 for your daily coding might be a
better choice, otherwise you'll have to use multiruby.

···

On Wed, Jun 24, 2009 at 1:43 AM, Martin Hess<martinhess@me.com> wrote:

On Jun 23, 2009, at 12:25 AM, Eric Hodel wrote:

On Jun 22, 2009, at 22:26, Martin Hess wrote:

I've tracked down a problem with a Gem I am trying to use. It turns out
that it has some non-ascii characters in it; for example the second quote in
the regular expression below is not an ASCII character:

parts = self.split( %r/( [:.;?!] | (?:|^)["“] )/x )

It produces errors like this:

   :in \`require&#39;:

/opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/lib/webby/core_ext/string.rb:14:
invalid multibyte char (US-ASCII) (SyntaxError)

I fixed it by adding the following to the top of the offending file:

   \# encoding: utf\-8

My questions:

* Is this the preferred fix?

Yes.

* Is there a way to work around this problem without modifying the Gem?

File a bug with the author and have them release a new version, otherwise
no.

* Is there an easy way to see if gems have non-ascii source files but
haven't included an encoding comment? Some kind of Ruby warning for
instance.

ruby -c will do this for you.

--
Michael Fellinger
CTO, The Rubyists, LLC
972-996-5199

With hoe, it's as easy as:

multiruby_setup the_usual # only once
rake multi

···

On Jun 24, 2009, at 19:08, Michael Fellinger wrote:

On Wed, Jun 24, 2009 at 1:43 AM, Martin Hess<martinhess@me.com> wrote:

So is it considered best practice to put an encoding comment at the begging
of all your files now days? Such as:

        # encoding: utf-8

or whatever encoding you like. is this what people are doing or are they
doing it one off for the files that have non-ascii characters?

It seems to me that if you have a modern editor it isn't too hard to
accidentally slip in some non-ascii characters resulting in some pain down
the road.

It isn't hard to mess up any code in a lot of ways, so as usual, try
to run/test it before you release/deploy :slight_smile:
That also means that using Ruby 1.9.1 for your daily coding might be a
better choice, otherwise you'll have to use multiruby.