Hi there,
[linux.gfbs:281]gfb> ruby -v
ruby 1.6.8 (2003-10-15) [i686-linux]
[linux.gfbs:282]gfb> irb
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{((\S+\s+){2,2})}
=> [["a b ", "b "], ["c d ", "d "]]
irb(main):003:0>
I am just wondering why String#scan "looses" a group in every match. I would expect the following result:
=> [["a b ", "a ", "b "], ["c d ", "c ", "d "]]
or even
=> [["a b ", ["a ", "b "]], ["c d ", ["c ", "d "]]]
Where am I wrong in my expectations?
Thank you,
Gennady.
P.S.
It works the same way in Ruby 1.8.0 as well.
when using sub-captures, then #scan returns an array of sub-captures.
This does not include capture[0].. which is the full-match.
"abcd".scan(/(.)(.)/)
#=> [["a", "b"], ["c", "d"]]
when not using sub-captures at all, then #scan returns only full-matches.
"abcd".scan(/../)
#=> ["ab", "cd"]
···
On Thursday 10 June 2004 18:49, Gennady wrote:
I am just wondering why String#scan "looses" a group in every match. I
would expect the following result:
--
Simon Strandgaard
Simon Strandgaard wrote:
I am just wondering why String#scan "looses" a group in every match. I
would expect the following result:
when using sub-captures, then #scan returns an array of sub-captures.
This does not include capture[0].. which is the full-match.
"abcd".scan(/(.)(.)/)
#=> [["a", "b"], ["c", "d"]]
when not using sub-captures at all, then #scan returns only full-matches.
"abcd".scan(/../)
#=> ["ab", "cd"]
--
Simon Strandgaard
In my original irb session capture I have sub-captures, moreover they are nested:
[linux.gfbs:281]gfb> ruby -v
ruby 1.6.8 (2003-10-15) [i686-linux]
[linux.gfbs:282]gfb> irb
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{((\S+\s+){2,2})}
ACTUAL => [["a b ", "b "], ["c d ", "d "]]
I EXPECT => [["a b ", "a ", "b "], ["c d ", "c ", "d "]]
irb(main):003:0>
···
On Thursday 10 June 2004 18:49, Gennady wrote:
^^^^ ^^^^
these are not subcaptures
and are thus not being captured.
you need parentesis in order to capture them
···
On Friday 11 June 2004 00:43, Gennady wrote:
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{((\S+\s+){2,2})}
ACTUAL => [["a b ", "b "], ["c d ", "d "]]
I EXPECT => [["a b ", "a ", "b "], ["c d ", "c ", "d "]]
--
Simon Strandgaard
Hi --
Simon Strandgaard wrote:
>
>>I am just wondering why String#scan "looses" a group in every match. I
>>would expect the following result:
>
>
> when using sub-captures, then #scan returns an array of sub-captures.
> This does not include capture[0].. which is the full-match.
>
> "abcd".scan(/(.)(.)/)
> #=> [["a", "b"], ["c", "d"]]
>
> when not using sub-captures at all, then #scan returns only full-matches.
>
> "abcd".scan(/../)
> #=> ["ab", "cd"]
>
> --
> Simon Strandgaard
>
In my original irb session capture I have sub-captures, moreover they
are nested:
[linux.gfbs:281]gfb> ruby -v
ruby 1.6.8 (2003-10-15) [i686-linux]
[linux.gfbs:282]gfb> irb
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{((\S+\s+){2,2})}
ACTUAL => [["a b ", "b "], ["c d ", "d "]]
I EXPECT => [["a b ", "a ", "b "], ["c d ", "c ", "d "]]
My understanding is: you've only got two sets of parentheses, so you
can have at most two captures; in other words, (){2} != ()() It's
purely positional: whatever is in the nth set of parentheses from the
left when the matching stops is the nth capture.
It's as if each () is a window which can move through the string but
can only hold one substring. So the second set of () sort of moves
from left to right:
(("a ")....)
("a "("b ")) # match completed
Result: $1 == "a b "
$2 == "b "
David
···
On Fri, 11 Jun 2004, Gennady wrote:
> On Thursday 10 June 2004 18:49, Gennady wrote:
--
David A. Black
dblack@wobblini.net
Simon Strandgaard wrote:
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{((\S+\s+){2,2})}
^^^^^^^
^^^^^^^^^^^^^^
These are sub-captures
ACTUAL => [["a b ", "b "], ["c d ", "d "]]
I EXPECT => [["a b ", "a ", "b "], ["c d ", "c ", "d "]]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
And this is scan's result presented by irb
···
On Friday 11 June 2004 00:43, Gennady wrote:
^^^^ ^^^^
these are not subcaptures
and are thus not being captured.
you need parentesis in order to capture them
--
Simon Strandgaard
David A. Black wrote:
Hi --
Simon Strandgaard wrote:
I am just wondering why String#scan "looses" a group in every match. I
would expect the following result:
when using sub-captures, then #scan returns an array of sub-captures.
This does not include capture[0].. which is the full-match.
"abcd".scan(/(.)(.)/)
#=> [["a", "b"], ["c", "d"]]
when not using sub-captures at all, then #scan returns only full-matches.
"abcd".scan(/../)
#=> ["ab", "cd"]
--
Simon Strandgaard
In my original irb session capture I have sub-captures, moreover they are nested:
[linux.gfbs:281]gfb> ruby -v
ruby 1.6.8 (2003-10-15) [i686-linux]
[linux.gfbs:282]gfb> irb
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{((\S+\s+){2,2})}
ACTUAL => [["a b ", "b "], ["c d ", "d "]]
I EXPECT => [["a b ", "a ", "b "], ["c d ", "c ", "d "]]
My understanding is: you've only got two sets of parentheses, so you
can have at most two captures; in other words, (){2} != ()() It's
purely positional: whatever is in the nth set of parentheses from the
left when the matching stops is the nth capture.
It's as if each () is a window which can move through the string but
can only hold one substring. So the second set of () sort of moves
from left to right:
(("a ")....)
("a "("b ")) # match completed
Result: $1 == "a b "
$2 == "b "
David
Thanks, David. It looks like this is the case. Actually, I solved my problem by using the following regexp instead:
[linux.gfbs:71]gfb-ems-session_1> irb
irb(main):001:0> a = "a b c d "
=> "a b c d "
irb(main):002:0> a.scan %r{#{'(\S+\s+)' * 2}}
=> [["a ", "b "], ["c ", "d "]]
irb(main):003:0>
(My actual regexp is much bigger, I just used a simplified form for an example)
Gennady.
···
On Fri, 11 Jun 2004, Gennady wrote:
On Thursday 10 June 2004 18:49, Gennady wrote: