Regular expressions question

hello,

i need to capture all matches for a group. for example if

'ab c' =~ /^(.)*$/

i would like to get array [ 'a', 'b', ' ', 'c' ]

could not figure out how to do it in ruby. String#scan did not seem to
be the right thing. please help.

thanks
konstantin

hello,

i need to capture all matches for a group. for example if

'ab c' =~ /^(.)*$/

i would like to get array [ 'a', 'b', ' ', 'c' ]

could not figure out how to do it in ruby. String#scan did not seem to
be the right thing. please help.

When using scan(), you need to remove the anchoring:

>> "ab c".scan(/./)
=> ["a", "b", " ", "c"]

Hope that helps.

James Edward Gray II

···

On Dec 14, 2005, at 3:02 PM, ako... wrote:

You could try:

irb(main):001:0> "ab c".split('') # split on nothing
=> ["a", "b", " ", "c"]

irb(main):002:0> "ab c".split(//) # same again
=> ["a", "b", " ", "c"]

irb(main):003:0> "ab c".scan(/./) # scan on any single char
=> ["a", "b", " ", "c"]

···

On Wed, 14 Dec 2005 21:00:56 -0000, ako... <akonsu@gmail.com> wrote:

i need to capture all matches for a group. for example if

'ab c' =~ /^(.)*$/

i would like to get array [ 'a', 'b', ' ', 'c' ]

--
Ross Bamford - rosco@roscopeco.remove.co.uk
"\e[1;31mL"

i give up. there seems to be no way to get all the captures for a
group. the corresponding $ variable just has the last one. thanks to
everyone who responded. sorry, did not mean to start a war over
people's coding styles.

konstantin

thank you. this was just an example. in general, is it possible to get
a collection of captures for a group without having to write custom
code?

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get ['a','b'] in group 1 and ['c'] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups' captures after a regex match? like in
microsoft's .net?

Hi,

i give up. there seems to be no way to get all the captures for a
group. the corresponding $ variable just has the last one.

Could you help us to understand why #scan didn't meet your needs?

Called without a block, #scan returns an array of matches:

"abc--------abc--------abc".scan(/(a)(b)(c)/)

=> [["a", "b", "c"], ["a", "b", "c"], ["a", "b", "c"]]

Called with a block, #scan calls your block each time a match is
found:

"abc--------abc--------abc".scan(/(a)(b)(c)/) { puts "#$1, #$2, #$3" }

a, b, c

Hope this helps,

Bill

···

From: "ako..." <akonsu@gmail.com>

a regex tool i'm finding invaluable is "redet" (on freshmeat)

works with a number of languages including ruby...

···

Ross Bamford wrote on 12/14/2005 4:32 PM:

i need to capture all matches for a group. for example if

'ab c' =~ /^(.)*$/

You could try:

--
http://home.cogeco.ca/~tsummerfelt1
telnet://ventedspleen.dyndns.org

Have to admit I'm not exactly a regex wiz, but I imagine it can be done somehow. I assume you mean having a repeated capturing group append to an array any number of times?

But, I still think scan is a good tool for the job, it can do any regexp anyway. I don't think a single regexp is really intended for doing variable numbers of captures anyway (?) ).

irb(main):054:0> "ab c".scan(/\w|\s/)
=> ["a", "b", " ", "c"]

or

irb(main):052:0> "this is a test".scan(/\w+/)
=> ["this", "is", "a", "test"]

or even

irb(main):053:0> "this is a test".scan(/\w+|\s/)
=> ["this", " ", "is", " ", "a", " ", "test"]

Cheers,
Ross

···

On Wed, 14 Dec 2005 21:34:52 -0000, ako... <akonsu@gmail.com> wrote:

thank you. this was just an example. in general, is it possible to get
a collection of captures for a group without having to write custom
code?

--
Ross Bamford - rosco@roscopeco.remove.co.uk
"\e[1;31mL"

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get ['a','b'] in group 1 and ['c'] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups' captures after a regex match? like in
microsoft's .net?

Perl-style variables:

>> "abc" =~ /(.)(.)(.)/
=> 0
>> p [$1, $2, $3]
["a", "b", "c"]
=> nil

Or object oriented:

>> md = "abc".match(/(.)(.)(.)/)
=> #<MatchData:0x325dc8>
>> p [md[1], md[2], md[3]]
["a", "b", "c"]
=> nil

Hope that helps.

James Edward Gray II

···

On Dec 14, 2005, at 4:03 PM, ako... wrote:

I don't really get what you mean. I don't understand the rules that got a and b into one group and c into another. When you say it's a general question, do you mean you just want access to the captures from some regexp match?

irb(main):009:0> "a , b,c" =~ /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/
=> 0
irb(main):010:0> $1
=> "a , b"
irb(main):011:0> $2
=> "c"
irb(main):012:0> $~[1]
=> "a , b"
irb(main):013:0> $~[2]
=> "c"
irb(main):014:0> md = /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/.match("a, b,c")
=> #<MatchData:0xb7a47860>
irb(main):015:0> md[1]
=> "a, b"
irb(main):016:0> md.captures[1]
=> "c"
irb(main):017:0> $~.inspect
=> "#<MatchData:0xb7a47860>"

(and others...)

Hope that helps,
Ross

···

On Wed, 14 Dec 2005 21:59:27 -0000, ako... <akonsu@gmail.com> wrote:

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get ['a','b'] in group 1 and ['c'] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups' captures after a regex match? like in
microsoft's .net?

--
Ross Bamford - rosco@roscopeco.remove.co.uk
"\e[1;31mL"

ako... wrote:

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get ['a','b'] in group 1 and ['c'] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups' captures after a regex match? like in
microsoft's .net?

t = 'a , b,c'.split( /\s*,\s*/ )
group1 = t[0..-2]
group2 = t[-1,1]

Bill,

scan does not help because it can match a portion of the source string,
and what is in between the matches is skipped. so scan is just a
special case of the functionality that i was looking for. i need to
make sure the whole string has a defined structure and get parts of it
as groups.

konstantin

You should be able to tell who this message is meant for:

PLEASE stop sending out code that uses any of the perl ${x} variables ...

They are ugly and have no place in Ruby ... they are only provided to
make the transition of Perl people easier ...

Please teach people to use MatchData objects ...

my_regex = /(\w\s*?.\s*?\w)\s*?.\s*?(\w)/

matches = my_regex.match( "a , b,c" )

element 0 of the matches object will contain the complete matched string.

each element after that will map to one of the groups you defined ...

so:

matches[0] will be the whole string
"a , b,c"
matches[1] will be your first group
"a , b"
matches[2] will be your second group
"c"

... seriously, we're not helping people make cleaner code when we show
approval for the ugly/evil ${x} warts we've kept from Perl.

... show people the beauty and cleanliness of using an OOP solution ...

I hope you agree.

j.

···

On 12/14/05, Ross Bamford <rosco@roscopeco.remove.co.uk> wrote:

On Wed, 14 Dec 2005 21:59:27 -0000, ako... <akonsu@gmail.com> wrote:

> thank you. the question is general.
>
> if i wanted to parse a list of letters separated by spaces and commas:
>
> 'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/
>
> i need to get ['a','b'] in group 1 and ['c'] in group 2. yes, i know i
> can split, then massage the result some more and get the final result.
> is there a way to get to groups' captures after a regex match? like in
> microsoft's .net?
>

I don't really get what you mean. I don't understand the rules that got a
and b into one group and c into another. When you say it's a general
question, do you mean you just want access to the captures from some
regexp match?

irb(main):009:0> "a , b,c" =~ /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/
=> 0
irb(main):010:0> $1
=> "a , b"
irb(main):011:0> $2
=> "c"
irb(main):012:0> $~[1]
=> "a , b"
irb(main):013:0> $~[2]
=> "c"
irb(main):014:0> md = /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/.match("a, b,c")
=> #<MatchData:0xb7a47860>
irb(main):015:0> md[1]
=> "a, b"
irb(main):016:0> md.captures[1]
=> "c"
irb(main):017:0> $~.inspect
=> "#<MatchData:0xb7a47860>"

(and others...)

Hope that helps,
Ross

--
Ross Bamford - rosco@roscopeco.remove.co.uk
"\e[1;31mL"

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

scan does not help because it can match a portion of the source string,
and what is in between the matches is skipped. so scan is just a
special case of the functionality that i was looking for. i need to
make sure the whole string has a defined structure and get parts of it
as groups.

Ah, OK thanks. From your earlier post:

if i wanted to parse a list of letters separated by spaces and commas:

'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get ['a','b'] in group 1 and ['c'] in group 2.

What about:

'a , b,c' =~ /^((?:\w\s*,\s*)*)(\w)$/
last_match = $2
first_matches = $1.scan(/\w/)

Since we first verified the whole string conforms to the required
pattern, we can then safely perform the scan on the captured group
to obtain the individual matches.

Or we could write the scan using look-ahead assertions, as another
way to prevent the skipping of in-between parts:

str = 'a , b,c'
# first verify whole pattern matches, and get final match group
if str =~ /^(?:\w\s*,\s*)*(\w)$/
  last_match = $1
  first_matches = str.scan(/(?:(\w)\s*,\s*)(?=(?:\w\s*,\s*)*\w$)/).flatten
end

# last_match => "c"
# first_matches => ["a", "b"]

HTH,

Bill

···

From: "ako..." <akonsu@gmail.com>

You should be able to tell who this message is meant for:

Why not just address me directly?

PLEASE stop sending out code that uses any of the perl ${x} variables ...

Well, okay. No need to shout though, is there?

Just trying to put a bit back, you know?

···

On Thu, 15 Dec 2005 00:16:52 -0000, Jeff Wood <jeff.darklight@gmail.com> wrote:

--
Ross Bamford - rosco@roscopeco.remove.co.uk
"\e[1;31mL"

Regular expressions is the only area I still use Perl magic variables
because it's concise, readable, and works well in that context. It feels
like a regexp standard to me.

The other magic variables I've dispensed with.

Nick

···

On 12/14/05, Jeff Wood <jeff.darklight@gmail.com> wrote:

You should be able to tell who this message is meant for:

PLEASE stop sending out code that uses any of the perl ${x} variables ...

They are ugly and have no place in Ruby ... they are only provided to
make the transition of Perl people easier ...

Please teach people to use MatchData objects ...

my_regex = /(\w\s*?.\s*?\w)\s*?.\s*?(\w)/

matches = my_regex.match( "a , b,c" )

element 0 of the matches object will contain the complete matched string.

each element after that will map to one of the groups you defined ...

so:

matches[0] will be the whole string
"a , b,c"
matches[1] will be your first group
"a , b"
matches[2] will be your second group
"c"

... seriously, we're not helping people make cleaner code when we show
approval for the ugly/evil ${x} warts we've kept from Perl.

... show people the beauty and cleanliness of using an OOP solution ...

I hope you agree.

j.

On 12/14/05, Ross Bamford <rosco@roscopeco.remove.co.uk> wrote:
> On Wed, 14 Dec 2005 21:59:27 -0000, ako... <akonsu@gmail.com> wrote:
>
> > thank you. the question is general.
> >
> > if i wanted to parse a list of letters separated by spaces and commas:
> >
> > 'a , b,c' =~ /^(?:(\w)\s*,\s*)*(\w)$/
> >
> > i need to get ['a','b'] in group 1 and ['c'] in group 2. yes, i know i
> > can split, then massage the result some more and get the final result.
> > is there a way to get to groups' captures after a regex match? like in
> > microsoft's .net?
> >
>
> I don't really get what you mean. I don't understand the rules that got
a
> and b into one group and c into another. When you say it's a general
> question, do you mean you just want access to the captures from some
> regexp match?
>
> irb(main):009:0> "a , b,c" =~ /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/
> => 0
> irb(main):010:0> $1
> => "a , b"
> irb(main):011:0> $2
> => "c"
> irb(main):012:0> $~[1]
> => "a , b"
> irb(main):013:0> $~[2]
> => "c"
> irb(main):014:0> md = /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/.match("a, b,c")
> => #<MatchData:0xb7a47860>
> irb(main):015:0> md[1]
> => "a, b"
> irb(main):016:0> md.captures[1]
> => "c"
> irb(main):017:0> $~.inspect
> => "#<MatchData:0xb7a47860>"
>
> (and others...)
>
> Hope that helps,
> Ross
>
> --
> Ross Bamford - rosco@roscopeco.remove.co.uk
> "\e[1;31mL"
>
>

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

--
Nicholas Van Weerdenburg

PLEASE stop sending out code that uses any of the perl ${x} variables ...

They are ugly and have no place in Ruby ... they are only provided to
make the transition of Perl people easier ...

Thankfully, this is Ruby, and not Python with its rigid
Only One Way mentality.

Myself, though I've been aware of MatchData for going on five years now, I find I don't use it that often. The
$1..$n variables are perfectly legible to me. They have
a fine history too: not just Perl but awk, and Unix shell
programming . . .

Regards,

Bill

···

From: "Jeff Wood" <jeff.darklight@gmail.com>

thank you. yes, it seems to be the only way. just that it is a shame
that we have to match the same expression again! the information was
available already, it was just discarded during the first match in your
sample.

konstantin

You should be able to tell who this message is meant for:

Yes, I recognize that you are probably speaking at least in part to me, since I did that in this very thread. You can call me by name if you like. I'm a big boy and I can take it. :wink:

PLEASE stop sending out code that uses any of the perl ${x} variables ...

Hang on there Mr. Code Police. Let's not lay down the law down too heavily before we get into this...

They are ugly and have no place in Ruby ... they are only provided to
make the transition of Perl people easier ...

I seriously doubt those variables were invented in Perl. They are a common feature to many Regular Expression implementation and I'm not sure they are even that ugly. $1 holds what was grabbed by the first set of parenthesis. Fairly logical.

Please teach people to use MatchData objects ...

I also showed a MatchData example.

I've used them a time or two, but honestly, they just don't feel right to me. I've stopped using the default variable, I'm using a two-space tab, etc. I'm Ruby assimilated, but I just like the Regexp-linked variables.

I see a lot of code running the Ruby Quiz and I feel quite confident saying that the Regexp variables are far more common than MatchData. I don't think that says anything bad about the latter, but it does tell me that you are in the minority. :wink:

We won't yell at you for using MatchData, if you'll provide the same consideration...

James Edward Gray II

···

On Dec 14, 2005, at 6:16 PM, Jeff Wood wrote: