Reding unicode characters?

Martin_Durai · 10 March 2008 03:37

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

···

--
Posted via http://www.ruby-forum.com/.

7stud · 10 March 2008 15:00

dare ruby wrote:

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

Ruby does not support unicode.

···

--
Posted via http://www.ruby-forum.com/\.

James_Edward_Gray_II · 10 March 2008 15:13

Really?

$ ruby -KU -r jcode -e 'p "Résumé".jsize'
6

James Edward Gray II

···

On Mar 10, 2008, at 10:00 AM, 7stud -- wrote:

dare ruby wrote:

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

Ruby does not support unicode.

Martin_Durai · 11 March 2008 03:15

Is there any possibilities using regular expressions or writing own
methods for unicode charatcers?

···

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e 'p "R�sum�".jsize'
6

James Edward Gray II

--
Posted via http://www.ruby-forum.com/\.

7stud · 11 March 2008 03:29

James Gray wrote:

···

On Mar 10, 2008, at 10:00 AM, 7stud -- wrote:

Jose Martin

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e 'p "Résumé".jsize'
6

James Edward Gray II

How does that prove the ruby supports unicode? Where are there any
unicode characters in your string?
--
Posted via http://www.ruby-forum.com/\.

Yukihiro_Matsumoto2 · 11 March 2008 05:12

Hi,

···

In message "Re: Reding unicode characters?" on Tue, 11 Mar 2008 12:29:58 +0900, 7stud -- <bbxx789_05ss@yahoo.com> writes:

How does that prove the ruby supports unicode? Where are there any
unicode characters in your string?

Then, tell me what makes you think it's proven.

matz.

Lionel_Bouton · 11 March 2008 10:04

7stud -- wrote:

James Gray wrote:

>> [...]

$ ruby -KU -r jcode -e 'p "R�sum�".jsize'
6

James Edward Gray II

How does that prove the ruby supports unicode? Where are there any unicode characters in your string?

1/ There's a difference between codepoints and characters, speaking of unicode "characters" is confusing at best.

2/ "Supporting unicode" is probably meaningless (which unicode encoding by the way?), building UTF-8 applications in Ruby is perfectly doable thanks to jcode, regex UTF-8 support, ... I know, among other things it's what I built my company on.

The example above obviously assumes an UTF-8 locale in the terminal you type it...
For more data, just try size instead of jsize in the same example and read jcode's rdoc.

Lionel

James_Edward_Gray_II · 11 March 2008 12:49

James Gray wrote:

Jose Martin

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e 'p "Résumé".jsize'
6

James Edward Gray II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that's why you see the -KU switch to tell Ruby the encoding.

James Edward Gray II

···

On Mar 10, 2008, at 10:29 PM, 7stud -- wrote:

On Mar 10, 2008, at 10:00 AM, 7stud -- wrote:

Todd_Benson · 11 March 2008 16:35

I think this may have been discussed before, but -KU doesn't work for
me on Windows XP. I get an unterminated string error with the
"Résumé" UTF-8 encoded string. I can only assume that the parser is
still interpreting the string as one byte per character. Anyone have
any ideas?

Todd

···

On Tue, Mar 11, 2008 at 7:49 AM, James Gray <james@grayproductions.net> wrote:

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

> Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that's why you see the -
KU switch to tell Ruby the encoding.

James Edward Gray II

7stud · 11 March 2008 23:46

James Gray wrote:

···

On Mar 10, 2008, at 10:29 PM, 7stud -- wrote:

6

James Edward Gray II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that's why you see the -
KU switch to tell Ruby the encoding.

James Edward Gray II

Ahh, I see. You think UTF-8 is unicode. And apparently you think that
when you enter a UTF-8 character in a post that everyone will see the
character you entered.
--
Posted via http://www.ruby-forum.com/\.

Jimmy_Kofler · 11 March 2008 16:49

Todd Benson wrote:

James Edward Gray II

I think this may have been discussed before, but -KU doesn't work for
me on Windows XP. I get an unterminated string error with the
"R�sum�" UTF-8 encoded string. I can only assume that the parser is
still interpreting the string as one byte per character. Anyone have
any ideas?

Todd

Maybe try a regex-based UTF-8 hack (Ruby 1.8.6) like here:
http://snippets.dzone.com/posts/show/4527

Cheers,
jk

···

On Tue, Mar 11, 2008 at 7:49 AM, James Gray <james@grayproductions.net> > wrote:

--
Posted via http://www.ruby-forum.com/\.

James_Edward_Gray_II · 12 March 2008 01:11

James Gray wrote:

6

James Edward Gray II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that's why you see the -
KU switch to tell Ruby the encoding.

James Edward Gray II

Ahh, I see. You think UTF-8 is unicode.

I this UTF-8 is an encoding of Unicode.

And apparently you think that when you enter a UTF-8 character in a post that everyone will see the character you entered.

I think I included the -KU switch to show you exactly what was going on.

I also think it was pointless for you to be rude about this, so I guess you succeeding in proving that what I think doesn't always matter.

James Edward Gray II
D

···

On Mar 11, 2008, at 6:46 PM, 7stud -- wrote:

On Mar 10, 2008, at 10:29 PM, 7stud -- wrote:

Todd_Benson · 11 March 2008 17:13

Thanks for the pointer!

Todd

···

On Tue, Mar 11, 2008 at 11:49 AM, Jimmy Kofler <koflerjim@mailinator.com> wrote:

Maybe try a regex-based UTF-8 hack (Ruby 1.8.6) like here:
http://snippets.dzone.com/posts/show/4527

Cheers,
jk

Topic		Replies	Views
Unicode in Ruby and a Ruby Reference ruby-talk	9	125	15 December 2004
Support for Unicode strings ruby-talk	1	89	11 January 2006
Help for Unicode char and Unicode char based string in Ruby ruby-talk	6	108	8 February 2008
Ruby and unicode ruby-talk	6	110	26 May 2006
A few good articles on Unicode ruby-talk	3	136	16 June 2006

Reding unicode characters?

Related topics