#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8>
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
I have installed latest ruby package from http://rubyinstaller.org/.
Here is my output of ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]
···
--
Posted via http://www.ruby-forum.com/.
str2.gsub!(/\w/u,'') #removes only latin characters
The question is why /\w/ ignore cyrillic characters?
Are cyrillic characters supposed to count as "word characters"? (\w) ?
If so then looks like a bug to me. Ping core.
-rp
···
--
Posted via http://www.ruby-forum.com/.
Nikolay Khodyunya wrote:
#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8>
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
I have installed latest ruby package from http://rubyinstaller.org/.
Here is my output of ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]
http://redmine.ruby-lang.org/issues/show/3181
http://redmine.ruby-lang.org/issues/show/3202
might be related. If you think it's wrong then bring it up on core.
-rp
···
--
Posted via http://www.ruby-forum.com/.
I think that \w (and similar shortcuts) is supposed to match ascii
characters only... thus it's equivalent to [a-zA-Z].
Isn't there some kind of unicode character class you can use?
···
On 4/27/10, Nikolay Khodyunya <nickolayho@gmail.com> wrote:
#coding: utf-8
str2 = "asdfМикимаус"
p str2.encoding #<Encoding:UTF-8>
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,'') #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
Actually "asdfМикимаус".gsub!(/\w/u,'') returns "" on linux so the
problem is from the windows package.
you can use "asdfМикимаус".gsub!(/\p{L}/,'') to remove letters thought
If they're the same version then it might be a window bug. Try it with
trunk and if it still fails then submit a bug report to the tracker...
···
--
Posted via http://www.ruby-forum.com/.
Roger Pack wrote:
Actually "asdfМикимаус".gsub!(/\w/u,'') returns "" on linux so the
problem is from the windows package.
Here's a copy of trunk if that would be useful:
http://rubydoc.ruby-forum.com/ruby_distros/ruby_trunk_no_patches_installed.7z
GL.
-rp
···
--
Posted via http://www.ruby-forum.com/.