#scan with or'd (`|`) subexpressions

Does the new Ruby regexp engine do this?

  irb(main):001:0> '1234'.scan(/(1)(2)|(3)(4)/)
  => [["1", "2", nil, nil], [nil, nil, "3", "4"]]
  irb(main):002:0>

Why would all the subexpressions be listed when there is an `|` (or) used? I
expected:

  => [["1", "2"], ["3", "4"]]

T.

Hi,

···

In message "Re: #scan with or'd (`|`) subexpressions." on Thu, 11 Nov 2004 23:29:58 +0900, "trans. (T. Onoma)" <transami@runbox.com> writes:

Does the new Ruby regexp engine do this?

irb(main):001:0> '1234'.scan(/(1)(2)|(3)(4)/)
=> [["1", "2", nil, nil], [nil, nil, "3", "4"]]
irb(main):002:0>

Why would all the subexpressions be listed when there is an `|` (or) used? I
expected:

=> [["1", "2"], ["3", "4"]]

You will never know which subexpression is matched, if you get your
expected result. Is there any reason /(1|3)(2|4)/ is not sufficient?

              matz.

>Does the new Ruby regexp engine do this?
>
> irb(main):001:0> '1234'.scan(/(1)(2)|(3)(4)/)
> => [["1", "2", nil, nil], [nil, nil, "3", "4"]]
> irb(main):002:0>
>
>Why would all the subexpressions be listed when there is an `|` (or) used? I
>expected:
>
> => [["1", "2"], ["3", "4"]]

You will never know which subexpression is matched, if you get your
expected result. Is there any reason /(1|3)(2|4)/ is not sufficient?

This matches 14 and 32 too. /(1(?=2)|3(?=4)(2|4)/ is better but more complex and generally hard to do.

Peter

Hi Matz,

Hi,

>Does the new Ruby regexp engine do this?
>
> irb(main):001:0> '1234'.scan(/(1)(2)|(3)(4)/)
> => [["1", "2", nil, nil], [nil, nil, "3", "4"]]
> irb(main):002:0>
>
>Why would all the subexpressions be listed when there is an `|` (or) used?
> I expected:
>
> => [["1", "2"], ["3", "4"]]

You will never know which subexpression is matched, if you get your
expected result.

Actually, trying to figure out which subexpression is matched is _exactly_ my
problem. I have a dozens of regexp in the form of:

  (#{spre})(#{start})(#{spost})(.*?)(#{epre})(#{end})(#{epost})

All of these are in an array (r) and strung together:
  
  re = Regexp.new( r.join('|') )

Then

  m =
  str.scan( re ) { m << $~ }

How do I know which array index (r[?]) produced the match? How does the
current behavior allow me to figure out which match?

Is there any reason /(1|3)(2|4)/ is not sufficient?

Hmm... well with a good bit of refactoring I might be able to do it this way.
Although some of my regexp's have zero-width look ahead and I suspect they
might be a problem here.

Thanks,
T.

···

On Thursday 11 November 2004 11:52 am, Yukihiro Matsumoto wrote:

In message "Re: #scan with or'd (`|`) subexpressions." > > on Thu, 11 Nov 2004 23:29:58 +0900, "trans. (T. Onoma)" <transami@runbox.com> writes:

This reminds me... when will Ruby support named subexpressions?
Oniguruma fully supports them now; but there doesn't appear to be a
way to access this in the ruby code.

thanks,
Mark

···

On Fri, 12 Nov 2004 01:52:00 +0900, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

Hi,

In message "Re: #scan with or'd (`|`) subexpressions." > on Thu, 11 Nov 2004 23:29:58 +0900, "trans. (T. Onoma)" <transami@runbox.com> writes:

>Does the new Ruby regexp engine do this?

>
> irb(main):001:0> '1234'.scan(/(1)(2)|(3)(4)/)
> => [["1", "2", nil, nil], [nil, nil, "3", "4"]]
> irb(main):002:0>
>
>Why would all the subexpressions be listed when there is an `|` (or) used? I
>expected:
>
> => [["1", "2"], ["3", "4"]]

You will never know which subexpression is matched, if you get your
expected result. Is there any reason /(1|3)(2|4)/ is not sufficient?