-------- Original-Nachricht --------
Datum: Sat, 8 Dec 2007 22:15:00 +0900
Von: MonkeeSage <MonkeeSage@gmail.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: sorting Array of accentuated Strings
>
> > You might be interested in using some of the emerging Unicode support
> > in Ruby. Ruby 2 will have it built-in and there are several libraries
> > out there, although I don't have any experience using them.
>
> right, thanks, i've only wrote a workaround before getting Ruby 2...
> --
> Une Bévue
Hmmm. Maybe I'm mistaken, but this seems to have nothing to do with
unicode. An ascii char is always going to be less than a utf-8 char,
since utf-8 is a superset of ascii.
Fenêtre <=> Être ->
F (\x46) <=> Ê (\xc3\x8a) ->
-1
To get the right behavior I think you have to translate the utf-8
characters to ascii. You can try something like:
require 'iconv'
class String
def translit
Iconv.iconv('ascii//translit', 'utf-8', self)[0]
end
end
a.sort { | i, j | i.translit <=> j.translit }
But some people have had strange effects from #iconv (e.g., a recent
thread [1]).
Regards,
Jordan
Besides that, the problem of sorting accented strings seems to be
somewhat unsolvable, as different natural languages using the
same accents have different conventions.
I'd claim the highest degree of inconsistency in this issue
for the German language (other proposals invited):
- German phone books sort words containing <A-DIAERESIS>,<O-DIAERESIS>,
<U-DIAERESIS>, as if they were spelled with "AE","OE","UE" instead of <A-DIAERESIS> etc.,
- otherwise, the diacritics are quite often just ignored,
- in Austria, including in phone books, diacritics come behind "z" .... (just like in Swedish, where <A-DIAERESIS>,<O-DIAERESIS> are also used (but consistently),
- French and Spanish use diaeresis on some letters to mark that
they have to be pronounced separately (Citro{"e}n,Camag{"u}ey).
(see: http://en.wikipedia.org/wiki/Collation\)
How can one establish a single standard, for all (natural) languages
with such a confusion ?
I'd recommend to use a couple of gsub calls, much like Xavier Noria
proposed in his post
http://groups.google.de/group/comp.lang.ruby/browse_thread/thread/9fbb85fa49dd700f/eed0350375a53abe
and to adapt them to the situation at hand to pre-process the strings
to sort.
Best regards,
Axel
···
On Dec 8, 1:09 am, unbewusst.s...@weltanschauung.com.invalid (Une > Bévue) wrote:
> cruiserdan <d...@zeraweb.com> wrote:
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: GMX E-Mail ✉ sichere & kostenlose E-Mail-Adresse ✉