A few good articles on Unicode

Charles_Nutter · 15 June 2006 16:15

To add a little fuel to the discussion (and to help dispel some rumors,
myths, and legends about Unicode) I present you with Tim Bray's 4-part
trilogy of articles on Unicode, why it's important, and why you should use
it. The first article provides a nice overview, even mentioning some of the
political and technical difficulties of CJK languages and Unicode (as well
as the previously-mentioned gaiji). The second article discusses character
strings in general. The third, perhaps most relevant to the Ruby Unicode
discussion is an exploration of characters versus bytes, and how the various
encodings work. The fourth article discusses Java's use of UTF-16
internally, and why that may be a good or bad thing.

At any rate, they're entertaining to read and cleared up a number of my own
questions about Unicode. Perhaps they will help the rest of us in the Ruby
community to understand Unicode as well.

Part 1: On the Goodness of Unicode -
http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode
Part 2: On Character Strings -
http://www.tbray.org/ongoing/When/200x/2003/04/13/Strings
Part 3: Characters vs. Bytes -
http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
Part 4: Programming Languages and Text -
http://www.tbray.org/ongoing/When/200x/2003/04/30/JavaStrings

And while not directly related, Tim also fiddled with a
fully-unicode-supporting UTF-8 string class in Java with many of the typical
C string operations (strcpy, strstr, ...). Some of the logic he uses for his
byte-vector-as-unicode-string might be applicable to Ruby as well:

Yooster (Ustr): http://www.tbray.org/ongoing/When/200x/2003/05/17/Yooster

···

--
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com

Dae_San_Hwang · 15 June 2006 16:43

Excellent! I'm particularly interested to learn more about pros/cons between using UTF-16 internally for all strings (Java) vs. being able to specify different encoding for each string object (Ruby 2.0).

Thanks for sharing,

Daesan

Dae San Hwang
daesan@gmail.com

···

On Jun 16, 2006, at 1:15 AM, Charles O Nutter wrote:

The fourth article discusses Java's use of UTF-16
internally, and why that may be a good or bad thing.

Christian_Neukirche1 · 15 June 2006 19:50

"Charles O Nutter" <headius@headius.com> writes:

At any rate, they're entertaining to read and cleared up a number of my own
questions about Unicode. Perhaps they will help the rest of us in the Ruby
community to understand Unicode as well.

Part 1: On the Goodness of Unicode -
ongoing by Tim Bray · On the Goodness of Unicode
Part 2: On Character Strings -
ongoing by Tim Bray · On Character Strings
Part 3: Characters vs. Bytes -
ongoing by Tim Bray · Characters vs. Bytes
Part 4: Programming Languages and Text -
ongoing by Tim Bray · Programming Languages and Text

And while not directly related, Tim also fiddled with a
fully-unicode-supporting UTF-8 string class in Java with many of the typical
C string operations (strcpy, strstr, ...). Some of the logic he uses for his
byte-vector-as-unicode-string might be applicable to Ruby as well:

Yooster (Ustr): ongoing by Tim Bray · Yooster, v0.1

While were are at it, also see
"The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)"

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Keith_Fahlgren1 · 16 June 2006 13:10

And it's probably worth mentioning that O'Reilly has a 678 page book on
Unicode coming to bookstores by the end of the month:

http://www.oreilly.com/catalog/unicode/index.html

HTH,
Keith

···

On Thursday 15 June 2006 3:50 pm, Christian Neukirchen wrote:

While were are at it, also see

Topic		Replies	Views
Deprecation and Unicode ruby-talk	12	114	4 August 2002
Unicode in Ruby and a Ruby Reference ruby-talk	9	125	15 December 2004
Ruby Weekly News 13th - 19th December 2004 ruby-talk	0	101	22 December 2004
Unicode in Ruby now? ruby-talk	0	104	6 August 2002
Unicode/multibyte string support in Ruby1.9/Ruby summary? ruby-talk	8	114	16 January 2005

A few good articles on Unicode

Related topics