Newbie locale/ascii sort question

This a newbie question, since I am not so skilled in Ruby.

Working with texts in languages other than English can be
problematic because of differing national character sort orders.

Besides ordinary “a-z” there are three more letters in Swedish:
“åäö”, in ASCII \206\204\224. Luckily, they come after "z"
but the order of the two first are reversed when sorted as ASCII.

Is it possible to handle this effectively in the optional code block for sort?

Generally, one would like to have access to locale sort order.
In Java it is quite easy to define sort order by specifying a locale:
Locale myLocale;
myLocale=new Locale(“se”,“SE”);
Collator se_SECollator = Collator.getInstance(myLocale);
Collections.sort(myArray, se_SECollator);

Does such utilities exist in the Ruby context?

A POSIX locales collection is to be found at:
http://std.dkuug.dk/i18n/WG15-collection/locales/

Bengt Dahlqvist

Hi,

Does such utilities exist in the Ruby context?

Not yet. But a small extension like this seems to work.

require “locale”

Locale::setlocale(Locale::LC_ALL, “ja_JP.eucJP”)
my_array.sort{|a,b| Locale::strcoll(a,b)}

Just for information. Does anybody want to work further?

						matz.

---- locale.c
#include “ruby.h”
#include <locale.h>

static VALUE
locale_setlocale(self, category, locale)
VALUE self, category, locale;
{
char *l;

if (NIL_P(locale)) l = NULL;
else l = StringValuePtr(locale);
l = setlocale(NUM2INT(category), l);
if (!l) return Qnil;
return rb_str_new2(l);

}

static VALUE
locale_strcoll(self, s1, s2)
VALUE self, s1, s2;
{
int n = strcoll(StringValuePtr(s1), StringValuePtr(s2));
return INT2NUM(n);
}

void
Init_locale()
{
VALUE mLocale = rb_define_module(“Locale”);
rb_define_module_function(mLocale, “setlocale”, locale_setlocale, 2);
rb_define_module_function(mLocale, “strcoll”, locale_strcoll, 2);

rb_define_const(mLocale, "LC_ALL", INT2NUM(LC_ALL));
rb_define_const(mLocale, "LC_COLLATE", INT2NUM(LC_COLLATE));
rb_define_const(mLocale, "LC_CTYPE", INT2NUM(LC_CTYPE));
rb_define_const(mLocale, "LC_MESSAGES", INT2NUM(LC_MESSAGES));
rb_define_const(mLocale, "LC_MONETARY", INT2NUM(LC_MONETARY));
rb_define_const(mLocale, "LC_NUMERIC", INT2NUM(LC_NUMERIC));
rb_define_const(mLocale, "LC_TIME", INT2NUM(LC_TIME));

}

···

In message “[ruby-talk] Re: newbie locale/ascii sort question” on 03/07/01, Bengt Dahlqvist bengt.dahlqvist@ling.uu.se writes: