I posted a similar question in the rails group but this is more specific
to ruby 1.8.2.
I read that ruby has problems with multibyte charsets. And I read that
there might be some problems with ISO-8859-15 related to REXML. And I
read that regex might have problems with ISO-8859-1.
Given the above problems (or rumors), which encoding is recommended for
use with ruby 1.8.2?
UTF-8
ISO-8859-1
ISO-8859-15
I'm certain both UTF-8 and ISO-8859-15 will support all the characters
I'll ever use. And ISO-8859-1 only lacks a couple characters I might
use on very rare occassions so I'm just looking for a charset that will
cause fewest problems with Ruby.
Thanks in advance for any suggestions.
···
--
Posted via http://www.ruby-forum.com/.
I posted a similar question in the rails group but this is more specific
to ruby 1.8.2.
I read that ruby has problems with multibyte charsets. And I read that
there might be some problems with ISO-8859-15 related to REXML. And I
read that regex might have problems with ISO-8859-1.
Given the above problems (or rumors), which encoding is recommended for
use with ruby 1.8.2?
None. They all cause problems. With utf-8 most string functions won't
work correctly (probably including regexps). There are special
extensions to work around this to some extent.
ISO-8858-1 and ISO-8859-15 should be pretty much the same. They are
simple 8-bit so the string functions that expect 1-byte characters
work. They won't allow you to use slightly more exotic characters
(like greek letters for maths, ...).
···
On 6/11/06, Jim Smith <nospam@nospam.lan> wrote:
UTF-8
ISO-8859-1
ISO-8859-15
I'm certain both UTF-8 and ISO-8859-15 will support all the characters
I'll ever use. And ISO-8859-1 only lacks a couple characters I might
use on very rare occassions so I'm just looking for a charset that will
cause fewest problems with Ruby.
Thanks in advance for any suggestions.
Length and indexing do not work very well with utf-8.
~ $ irb -Ku
irb(main):001:0> $KCODE
=> "UTF8"
irb(main):002:0> a='α-ω'
=> "α-ω"
irb(main):003:0> r=/[β-ω]/
=> /[β-ω]/
irb(main):004:0> a.length
=> 5
irb(main):005:0> a[0..0]
=> "\316"
irb(main):006:0> a[0..1]
=> "α"
Fortunately, the regexps work.
irb(main):007:0> a =~ r
=> 3
So you could use a.scan /./ to calculate length or index characters in a string.
irb(main):008:0> a.scan /./
=> ["α", "-", "ω"]
irb(main):009:0> (a.scan /./).length
=> 3
Michal
···
On 6/11/06, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
Hi,
In message "Re: Which encoding causes fewest problems in Ruby 1.8.2?" > on Sun, 11 Jun 2006 10:12:45 +0900, Jim Smith <nospam@nospam.lan> writes:
>Given the above problems (or rumors), which encoding is recommended for
>use with ruby 1.8.2?
>
>UTF-8
>ISO-8859-1
>ISO-8859-15
String and Regexp handles all of them for most of the cases. But
upper/lower case handling for non ASCII alphabets are not supported.
Use -Ku for UTF-8 and -Kn for ISO-8859-*.
I am sure you do know 
But it is not what I call 'String handles them all most of the cases'.
Thanks
Michal
···
On 6/12/06, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
Hi,
In message "Re: Which encoding causes fewest problems in Ruby 1.8.2?" > on Mon, 12 Jun 2006 12:14:48 +0900, "Michal Suchanek" <hramrach@centrum.cz> writes:
>Length and indexing do not work very well with utf-8.
I know. Operations on characters should based on Regexp.