>
>> irb, Ruby 1.9.1
>>
>> What am I missing here?
>>
>> "b T T W b".match(/(?<!t t|a b) w/i)
>> => nil
>>
>> #The second look-behind is now just a
>> "b T T W b".match(/(?<!t t|a) w/i)
>> => #<MatchData " W">
>>
>> #Regex stays the same, the T T are now in lower case
>> "b t t W b".match(/(?<!t t|a) w/i)
>> => nil
>>
>> #Look-behind only contains the t t condition now and, T T are back to
>> upper case
>> "b T T W b".match(/(?<!t t) w/i)
>> => nil
>
> No bug here. It is doing exactly what you asked: only match a w if it is
not
> preceded by 't t'. In all cases the w is preceded by 't t', and in the
case
> that did match (?<!t t|a), the w was preceded by a 't t' but not an 'a',
as
> you asked, so it did match.
That was an alternative! If the RX in the lookbehind can match, the
negative lookbehind must fail IMHO.
The thing is what's in the lookbehind, and all assertions for that matter,
is not really a regular expression. It is a fixed length literal. The only
exception, AFAIK, is character sets because they are also fixed length. The
engine needs to know how many characters to step back and examine.
Docs say that the regexp cannot be unlimited. But it is by far not
only a fixed length literal. "|" is certainly meta in an assertion -
the second line would not match if the lookbehind assertion was a
literal.
10:45:31 ~$ ruby19 x.rb
bc /(?<=ab)c/
bc /(?<=a|b)c/ ["c"]
bc /(?<=a\|b)c/
abc /(?<=ab)c/ ["c"]
abc /(?<=a|b)c/ ["c"]
abc /(?<=a\|b)c/
a>bc /(?<=ab)c/
a>bc /(?<=a|b)c/ ["c"]
a>bc /(?<=a\|b)c/ ["c"]
a\|bc /(?<=ab)c/
a\|bc /(?<=a|b)c/ ["c"]
a\|bc /(?<=a\|b)c/
10:45:32 ~$ cat x.rb
str = ["bc", "abc", "a|bc", "a\\|bc"]
rxs = [/(?<=ab)c/,/(?<=a|b)c/,/(?<=a\|b)c/]
str.each do |s|
rxs.each do |r|
printf "%-10s %-15p %p\n", s, r, s.scan(r)
end
end
10:45:45 ~$
Docs even say "In negative-look-behind, captured group isn't allowed,
but shy group(? is allowed." So it's a regexp albeit a limited one.
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
Also the first alternative that matches wins. Here it is in lower case and
without ignoring case:
"b t t w b".match( /(?<!t t|a) w/ )
=> nil
There is a problem with the match though. I suspect there is an issue
with case sensitivity propagation
irb(main):009:0> "b T T W b".match(/(?<!t t|a) w/i)
=> #<MatchData " W">
irb(main):010:0> "b T T W b".match(/(?i:<!t t|a) w/i)
=> nil
That's not a valid assertion any more, it is now an options specification.
"b <!t t w b".match( /(?i:<!t t|a) w/ )
=> #<MatchData "<!t t w">
Right, apparently we cannot have options in assertions.
irb(main):013:0> RUBY_VERSION
=> "1.9.1"
irb(main):014:0> RUBY_PATCHLEVEL
=> 430
I initially tried the cases with 1.9.2, but I tried the above with the
latest 1.9.1 on my system (a bit older).
RUBY_VERSION
=> "1.9.1"
RUBY_PATCHLEVEL
=> 378
The root issue still exists
irb(main):014:0> "a ac".scan /(?<!a a|b)c/i
=>
irb(main):015:0> "A Ac".scan /(?<!a a|b)c/i
=> ["c"]
irb(main):016:0> "ac".scan /(?<!a|b)c/i
=>
irb(main):017:0> "Ac".scan /(?<!a|b)c/i
=>
Statement 15 should not yield any results in the same way as 17 does.
Apparently /i breaks in if there is an alternative ("|") in
conjunction with more than one chars in one alternative:
Fails (more than 1 char AND alternative)
irb(main):018:0> "aac".scan /(?<!aa|b)c/i
=>
irb(main):019:0> "AAc".scan /(?<!aa|b)c/i
=> ["c"]
irb(main):020:0> "Aac".scan /(?<!aa|b)c/i
=> ["c"]
irb(main):021:0> "aAc".scan /(?<!aa|b)c/i
=> ["c"]
Works (more then 1 char OR alternative):
irb(main):022:0> "aac".scan /(?<!aa)c/i
=>
irb(main):023:0> "aAc".scan /(?<!aa)c/i
=>
irb(main):024:0> "Aac".scan /(?<!aa)c/i
=>
irb(main):025:0> "AAc".scan /(?<!aa)c/i
=>
irb(main):026:0> "ac".scan /(?<!a)c/i
=>
irb(main):027:0> "Ac".scan /(?<!a)c/i
=>
irb(main):028:0> "ac".scan /(?<!a|b)c/i
=>
irb(main):029:0> "Ac".scan /(?<!a|b)c/i
=>
IMHO this is a bug.
Kind regards
robert
···
On Tue, Nov 23, 2010 at 5:12 PM, Ammar Ali <ammarabuali@gmail.com> wrote:
On Tue, Nov 23, 2010 at 5:55 PM, Robert Klemme > <shortcutter@googlemail.com>wrote:
On Tue, Nov 23, 2010 at 4:36 PM, Ammar Ali <ammarabuali@gmail.com> wrote:
> On Tue, Nov 23, 2010 at 4:36 PM, Ruby Nuby <b1st@hotmail.com> wrote:
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/