Ruby-dev summary 20489 - 20519

Masayoshi_TAKAHASHI · 9 July 2003 17:12

Hello all,

I’m sorry to post so late. This is a summary of ruby-dev ML
last week.

[ruby-dev:20491] [Oniguruma] explicit capture
[ruby-dev:20514] [Oniguruma] Version 1.9.1

Recently, the translation of '‘Mastering Regular Expressions’'
2nd ed. was published in Japan. Kosako, the author of Oniguruma,
read it and found the ExplicitCapture option in .NET, which will
canceled groups except named groups. So Kosako added an option
REG_OPTION_CAPTURE_ONLY_NAMED_GROUP and a notation (?n:…)
in Oniguruma 1.9.1.

But Tanaka Akira pointed out that Ruby already used /n option,
and proposed using /c option instead of /n. Kosako agreed
Tanaka’s idea.

[ruby-dev:20495] matching with invalid byte sequence

Kazuhiro NISHIYAMA pointed out that /./ matched with an invalid
byte sequence in UTF-8.

require 'uconv’
if /./u =~ "\xa3"
Uconv.u8toeuc($&) #=> illegal UTF-8 sequence (a3) (Uconv::Error)
end

But ‘/./s =~ “\xF1”’ and ‘/./e =~ “\xF6”’ don’t match.
So he suggested that /./ should match one character, even if
$KCODE is UTF-8.

Nobu answered that Ruby’s regexp doesn’t check whether multi-byte
character sequence is valid or not, at least in current Ruby.
And the reason why /./s and /./e don’t match “\xF1” and "\xF6"
each other is that each string should be considered first byte
of multi-byte character, but followed by no trailing bytes.

Regards,

TAKAHASHI ‘Maki’ Masayoshi E-mail: maki@rubycolor.org

Topic		Replies	Views
Oniguruma question ruby-talk	1	66	7 August 2007
Specification of Ruby regex? ruby-talk	31	181	28 August 2003
Oniguruma: Different result in ruby 1.9.1 and 1.8.7 ruby-talk	5	209	5 September 2009
Named groups in regexp matches? ruby-talk	9	148	3 February 2007
I think this is a regexp bug ruby-talk	2	111	25 May 2007

Ruby-dev summary 20489 - 20519

Related topics