Hi,
I'm using Ruby 1.8.6, and I just discovered something rather
interesting, here is a test:
require 'test/unit'
class TestRegexBug < Test::Unit::TestCase
def test_bug
hours = "pon-čet"
assert(hours =~ /[č]et/i)
assert(hours =~ /čet/i)
assert(hours =~ /-čet/i)
assert(hours =~ /[cč]et/i)
assert(hours =~ /-[č]et/i)
end
end
As you can see, this only happens with unicode letters... (the last test
fails).. I'm used to the fact that //i doesn't work for unicode chars
and I already know that you need two dots to match one of these.. But
this problem is different and weirder, because what triggers it is a
minus sign before the square brackets.. if you remove either the '-' or
'[]' from the regex, it works..
Can you comment?
thank you,
david
···
--
Posted via http://www.ruby-forum.com/.
Hi,
I'm using Ruby 1.8.6, and I just discovered something rather
interesting, here is a test:
$KCODE = 'UTF8'
require 'jcode'
require 'test/unit'
class TestRegexBug < Test::Unit::TestCase
def test_bug
hours = "pon-čet"
assert(hours =~ /[č]et/i)
assert(hours =~ /čet/i)
assert(hours =~ /-čet/i)
assert(hours =~ /[cč]et/i)
assert(hours =~ /-[č]et/i)
end
end
As you can see, this only happens with unicode letters... (the last test
fails).. I'm used to the fact that //i doesn't work for unicode chars
and I already know that you need two dots to match one of these.. But
this problem is different and weirder, because what triggers it is a
minus sign before the square brackets.. if you remove either the '-' or
'' from the regex, it works..
Can you comment?
thank you,
david
Ruby is not natively aware of unicode, but you can get all these to pass if you give it the $KCOCDE hint.
-Rob
Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
···
On Mar 3, 2008, at 2:24 PM, D. Krmpotic wrote:
In the regex [č] is a character class with _two_ bytes. So
Ruby tries to match a minus followed by _one_ of the bytes
out of "č" followed by "et". So the regex would match
"pon-\304et" or "pon-\215et", but not "pon-\304\215et".
Stefan
···
2008/3/3, D. Krmpotic <david.krmpotic@gmail.com>:
Hi,
I'm using Ruby 1.8.6, and I just discovered something rather
interesting, here is a test:
require 'test/unit'
class TestRegexBug < Test::Unit::TestCase
def test_bug
hours = "pon-čet"
assert(hours =~ /[č]et/i)
assert(hours =~ /čet/i)
assert(hours =~ /-čet/i)
assert(hours =~ /[cč]et/i)
assert(hours =~ /-[č]et/i)
end
end
As you can see, this only happens with unicode letters... (the last test
fails).. I'm used to the fact that //i doesn't work for unicode chars
and I already know that you need two dots to match one of these.. But
this problem is different and weirder, because what triggers it is a
minus sign before the square brackets.. if you remove either the '-' or
'' from the regex, it works..
Great info.. completely forgot that this is available...
thank you
david
···
$KCODE = 'UTF8'
require 'jcode'
--
Posted via http://www.ruby-forum.com/\.