Within regexp’s you can match a hexbyte by typing ‘\x42’
irb(main):001:0> /\x20/.match(“z z”).to_a
=> [" "]
irb(main):002:0>
However with ruby-1.9 plus UTF-8 enabled
ruby -v
ruby 1.9.0 (2004-05-17) [i386-freebsd5.1]
irb
irb(main):001:0> str = [0x70, 0x80, 0x90].pack(‘U*’)
=> “p\302\200\302\220”
irb(main):002:0> $KCODE = ‘U’
=> “U”
irb(main):003:0> /\x{80}/
SyntaxError: compile error
(irb):3: Invalid escape character syntax
/\x{80}/
^
(irb):3: unterminated string meets end of file
(irb):3: syntax error
/\x{80}/
^
from (irb):3
irb(main):004:0> /\x{80}/.match(str)
=> nil
irb(main):005:0> Regexp.new(‘\x{80}’) =~ str
=> 1
irb(main):006:0>
What seems absurd to me is that Ruby’s builtin / … / syntax cannot deal
with \x{80} while Regexp.new has no problems with it.
I guess this is a problem in Ruby’s / … / syntax?
···
–
Simon Strandgaard