UTF-8 strings?

Hello gentlemen!

A complete newbie question.
I come from PHP background and wanted to try Ruby (having watched all the beautiful presentations about Rails). Got started with the very basics, but already found a problem which might lead me to finish my research altogether and go back to PHP.

What is the situation in Ruby when it comes to UTF-8? I have to process (fetch, display, whatever) lost of russian strings, I store them in a database, I want to write and read them to and from XML etc.

Some very simple one-liners with my name written in russian yield exceptionally faulty results (ranging from nils to non-displayable characters). I get wrong results from downcase and index lookup, among others. If Ruby cannot do these things with strings I cannot use it for any real-life projects. PHP was ugly but the mbstring extension was doing all the job for me there.

I am running the binary 1.9 build for MacOS X, my terminal is set to UTF-8 and input escaping is disabled in bash. All other scripts process my strings correctly so that it not an input problem.

I was trying to find any info on the subject but the few pages mentioning this were in Japanese. Maybe someone can enlighten me? Maybe I just have to declare/import some module that will overload the string functions for me?

Hi,

At Mon, 25 Oct 2004 10:19:08 +0900,
Julik Tarkhanov wrote in [ruby-talk:117548]:

Some very simple one-liners with my name written in russian yield
exceptionally faulty results (ranging from nils to non-displayable
characters). I get wrong results from downcase and index lookup, among
others. If Ruby cannot do these things with strings I cannot use it for any
real-life projects. PHP was ugly but the mbstring extension was doing all
the job for me there.

Run ruby with -Ku option if your scripts are written in UTF-8,
or set $KCODE to 'u'.

ยทยทยท

--
Nobu Nakada