Regexps and Unicode

Hi,

I’m trying to process unicode files with Ruby. These files contain
sentances in many languages, including right-to-left languages, and are
usually UTF-16.

After much head-scratching I managed to get everything (reading, writing
and regexing) working properly using Iconv/IO (thanks Masahiro Sakai!)
and UTF-8.

My question is: Are regexes and UTF-8 fully compatible? Is there
anything that’s going to bite me down the track?

On the plus side, if anyone wants more info and how-to - shoot me an
email :slight_smile:

Cheers,
Assaph

···

Assaph Mehr

Email: assaph(nospam)avaya()com Auslabs (Avaya Labs Australia)
Phone: +61-2-9352-9247 Level 3, 123 Epping Rd, North Ryde,
NSW 2113
Fax: +61-2-9352-9247 Web: http://www.avaya.com