Unicode in Ruby and a Ruby Reference

Hi everyone.

I'm making my second big attempt in getting into Ruby, and I have a
couple of questions. I hope they don't sound too trivial.

1. I was wondering what the state is of Ruby and support for Unicode?
For instance, I'm coming mostly from Python which has a special
Unicode type that can be translated to various encodings on request.
I can't seem to find anything similar in Ruby. Does it exist
anywhere, or is it standard to deal with Unicode in a completely
different way, or is it something that hasn't been developed at this
point?

2. What are the most definitive references for Ruby and the standard
libraries that are available? I've found the reference at RubyCentral
to be very helpful (http://www.rubycentral.com/ref/), but it also
seems to be missing things here and there. On the other hand, it's
possible that I'm completely mis-reading it.

For instance, I found out about the Singleton module completely by
chance while reading this group. It certainly appears to work in my
Ruby 1.8.1 interpreter, but I can't seem to find it formally described
anywhere. I know what I've seen of it, but I don't know what else it
might have to it. I also wonder about all of the other things I could
be missing out on.

Thanks in advance for the help. I really like Ruby as a language and
I hope I'll be able to use it for some things later on. I'm just
interested to find out if these things are still in early stages of
development, or if I'm simply missing things.

Thanks.
Mike.

Hi,

1. I was wondering what the state is of Ruby and support for Unicode?
For instance, I'm coming mostly from Python which has a special
Unicode type that can be translated to various encodings on request.
I can't seem to find anything similar in Ruby. Does it exist
anywhere, or is it standard to deal with Unicode in a completely
different way, or is it something that hasn't been developed at this
point?

Handing Unicode (UTF-8) is OK. Ruby's strings can contain any
sequence of bytes. Regex engine is aware of UTF-8 so that you can
use pattern match against Unicode characters. For encoding
conversion, iconv library is your friend.

This is weaker than Python, but does most of the jobs. We are working
on M17N Ruby (M17N stands for multilingualization), in which you can
handle many encodings (e.g. UTF-8, UTF-16, Big5, GBK, and much more)
without conversion.

              matz.

···

In message "Re: Unicode in Ruby and a Ruby Reference" on Tue, 14 Dec 2004 16:33:17 +0900, Mike McGavin <iizogii@gmail.com> writes:

What you are reading online is "Programming Ruby, 1ed", a book by Dave
Thomas and Andy Hunt. The second edition hit the shelves recently but
there's no online version. It's a purchase you won't regret, and it
describes all the standard libraries by example, and all the builtin
classes in detail (up to date with the latest Ruby).

Information about the standard library is also housed at

  RDoc Documentation

Cheers,
Gavin

···

On Tuesday, December 14, 2004, 6:33:17 PM, Mike wrote:

2. What are the most definitive references for Ruby and the standard
libraries that are available? I've found the reference at RubyCentral
to be very helpful (http://www.rubycentral.com/ref/\), but it also
seems to be missing things here and there. On the other hand, it's
possible that I'm completely mis-reading it.

Hi again.

···

On Tue, 14 Dec 2004 20:33:14 +1300, Mike McGavin <iizogii@gmail.com> wrote:

I'm making my second big attempt in getting into Ruby, and I have a
couple of questions.
[--snip--]

I just wanted to say thanks for all of the feedback from everyone
following my questions about the Ruby reference documentation and the
unicode questions. It's been very helpful, and I'll continue to
monitor the thread.

Thanks.
Mike.

Yukihiro Matsumoto wrote:

···

In message "Re: Unicode in Ruby and a Ruby Reference" > on Tue, 14 Dec 2004 16:33:17 +0900, Mike McGavin <iizogii@gmail.com> writes:

>1. I was wondering what the state is of Ruby and support for Unicode?
> For instance, I'm coming mostly from Python which has a special
>Unicode type that can be translated to various encodings on request. >I can't seem to find anything similar in Ruby. Does it exist
>anywhere, or is it standard to deal with Unicode in a completely
>different way, or is it something that hasn't been developed at this
>point?

Handing Unicode (UTF-8) is OK. Ruby's strings can contain any
sequence of bytes. Regex engine is aware of UTF-8 so that you can
use pattern match against Unicode characters. For encoding
conversion, iconv library is your friend.

However I think that this awareness is just where a code point begins and ends. This might have changed with Onigurama, but "Ä"[/ä/i] used to return nil.

How a literal Unicode character can be inserted in a Ruby String? I
recall Java having the \uNNNN escaping, for example, but I wasn't able
to find a similar mechanism for Ruby. (On the other hand, I'm aware of
escaping for octal and hex character codes, e.g. \NNN and \xNN.)

···

--
G.P.

You can also buy the PDF version of this book from:
http://pragmaticprogrammer.com/shopsite_sc/store/html/index.html
which will cost $25.00, I think.

Thanks,
MOhammad

···

On Tue, 2004-12-14 at 03:18, Gavin Sinclair wrote:

On Tuesday, December 14, 2004, 6:33:17 PM, Mike wrote:

> 2. What are the most definitive references for Ruby and the standard
> libraries that are available? I've found the reference at RubyCentral
> to be very helpful (http://www.rubycentral.com/ref/\), but it also
> seems to be missing things here and there. On the other hand, it's
> possible that I'm completely mis-reading it.

What you are reading online is "Programming Ruby, 1ed", a book by Dave
Thomas and Andy Hunt. The second edition hit the shelves recently but
there's no online version. It's a purchase you won't regret, and it
describes all the standard libraries by example, and all the builtin
classes in detail (up to date with the latest Ruby).

Information about the standard library is also housed at

  http://ruby-doc.org/stdlib

Cheers,
Gavin

--

[mkhan@localhost local]$ make love
make: *** No rule to make target `love'. Stop.

just fyi:
  http://www.rubycentral.com/book/lib_patterns.html
the patterns and standard lib sections contain some pretty nifty stuff.
also. "ri" totally rocks :slight_smile:
Alex

···

On Dec 15, 2004, at 12:56 AM, Mike McGavin wrote:

Hi again.

On Tue, 14 Dec 2004 20:33:14 +1300, Mike McGavin <iizogii@gmail.com> > wrote:

I'm making my second big attempt in getting into Ruby, and I have a
couple of questions.
[--snip--]

I just wanted to say thanks for all of the feedback from everyone
following my questions about the Ruby reference documentation and the
unicode questions. It's been very helpful, and I'll continue to
monitor the thread.

Thanks.
Mike.

Hi,

···

In message "Re: Unicode in Ruby and a Ruby Reference" on Tue, 14 Dec 2004 18:37:21 +0900, Florian Gross <flgr@ccan.de> writes:

However I think that this awareness is just where a code point begins
and ends. This might have changed with Onigurama, but "Ä"[/ä/i] used to
return nil.

Onigurama should aware of it, although I found a bug there.
I will fix soon. Thank you.

              matz.

\u4321 is a UTF-16BE encoding, so you would need to know the
equivalent UTF-8 encoding, e.g., \xe4\x8c\xa1.

-austin

···

On Tue, 14 Dec 2004 19:12:18 +0900, Giulio Piancastelli <giulio.piancastelli@gmail.com> wrote:

How a literal Unicode character can be inserted in a Ruby String? I
recall Java having the \uNNNN escaping, for example, but I wasn't able
to find a similar mechanism for Ruby. (On the other hand, I'm aware of
escaping for octal and hex character codes, e.g. \NNN and \xNN.)

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca