Getting a list of results from one regular expression

Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias="BLAH"

And because of the (.*), the result will be BLAH

Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.

If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
array that is ["BLAH", "BLEH"]?

Keep in mind that there are a dynamic number of matches for
stringAlias="(.*)"

This is the code I wrote to try to do it:

def ...
@aliases = []
matchedData = /stringAlias="(.*?)"/.match(@data)
@aliases = matchedData.to_a
puts @aliases
end

The length of the array is 2 and the result is this:
stringAlias="OP"
OP

Even though the data is this:
<string RSLDefined="false" active="false" languageId="1"
    sortOrder="0" stringAlias="OP">
    <stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="1" stringAlias="1">
    <stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="2" stringAlias="2">
    <stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="3" stringAlias="3">
    <stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="4" stringAlias="4">
    <stringValue><![CDATA[Complete]]></stringValue>
</string>

tietyt@gmail.com wrote:

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

String#scan

I'm sure there are other ways, though. I just learned about String#scan today. (Yes, Dave, my copy of the Pickaxe is on its way.)

Devin

I usually use String#scan.

"testwoohootestkaboomtestyutyut".scan(/test../)
=> ["testwo", "testka", "testyu"]

···

On 22/06/05, tietyt@gmail.com <tietyt@gmail.com> wrote:

Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias="BLAH"

And because of the (.*), the result will be BLAH

Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.

If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
array that is ["BLAH", "BLEH"]?

Keep in mind that there are a dynamic number of matches for
stringAlias="(.*)"

This is the code I wrote to try to do it:

def ...
@aliases =
matchedData = /stringAlias="(.*?)"/.match(@data)
@aliases = matchedData.to_a
puts @aliases
end

The length of the array is 2 and the result is this:
stringAlias="OP"
OP

Even though the data is this:
<string RSLDefined="false" active="false" languageId="1"
    sortOrder="0" stringAlias="OP">
    <stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="1" stringAlias="1">
    <stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="2" stringAlias="2">
    <stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="3" stringAlias="3">
    <stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="4" stringAlias="4">
    <stringValue><![CDATA[Complete]]></stringValue>
</string>

Regexp#match only gives the first match; the matchdata object is sort
of an array of the entire match, followed by the subexpression
matches. What you want is String#scan: (warning, untested)

  regexp = /stringAlias="(.*?)"/
  matches = @data.scan(regexp)

Since the regexp has a subexpression matcher, that is what will be put
into the array "matches". You'll get an array something like this:

  [["OP"],["1"],["2"], ... ]

(each match has it's own subarray, since it's a subexpression match)

Check out the docs for String#scan for more info...

cheers,
Mark

···

On 6/22/05, tietyt@gmail.com <tietyt@gmail.com> wrote:

Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias="BLAH"

And because of the (.*), the result will be BLAH

Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.

If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
array that is ["BLAH", "BLEH"]?

Keep in mind that there are a dynamic number of matches for
stringAlias="(.*)"

This is the code I wrote to try to do it:

def ...
@aliases =
matchedData = /stringAlias="(.*?)"/.match(@data)
@aliases = matchedData.to_a
puts @aliases
end

The length of the array is 2 and the result is this:
stringAlias="OP"
OP

Even though the data is this:
<string RSLDefined="false" active="false" languageId="1"
    sortOrder="0" stringAlias="OP">
    <stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="1" stringAlias="1">
    <stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="2" stringAlias="2">
    <stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="3" stringAlias="3">
    <stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
    sortOrder="4" stringAlias="4">
    <stringValue><![CDATA[Complete]]></stringValue>
</string>

In addition to the correct response given by others (String#scan), you might also want to look at the StringScanner class. It gives you the ability to crawl through a string with successive regexp calls, where each new call starts at the new 'current' position.

story = <<ENDSTORY
Hello World! There are 3 cats in my house, with 4 feet each.

6 of those 12 feet have 5 claws each; the other 6 feet have 4 claws each.

Ow, my back. 54 claws need clipping.
ENDSTORY

require 'strscan'
scanner = StringScanner.new( story )

info =
count_nouns = /(\d+) (\w+)/

until scanner.eos?
   break unless scanner.scan_until( count_nouns )
   tidbit = {
     :full_match => scanner[0],
     :count => scanner[1].to_i,
     :noun => scanner[2]
   }
   info << tidbit
end

require 'pp'
pp info
info.each{ |tidbit|
   puts "Of %7s, I saw %02d" % [ tidbit[:noun], tidbit[:count] ]
}

[{:noun=>"cats", :count=>3, :full_match=>"3 cats"},
{:noun=>"feet", :count=>4, :full_match=>"4 feet"},
{:noun=>"of", :count=>6, :full_match=>"6 of"},
{:noun=>"feet", :count=>12, :full_match=>"12 feet"},
{:noun=>"claws", :count=>5, :full_match=>"5 claws"},
{:noun=>"feet", :count=>6, :full_match=>"6 feet"},
{:noun=>"claws", :count=>4, :full_match=>"4 claws"},
{:noun=>"claws", :count=>54, :full_match=>"54 claws"}]
Of cats, I saw 03
Of feet, I saw 04
Of of, I saw 06
Of feet, I saw 12
Of claws, I saw 05
Of feet, I saw 06
Of claws, I saw 04
Of claws, I saw 54

···

On Jun 22, 2005, at 8:30 PM, tietyt@gmail.com wrote:

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

tietyt@gmail.com schrieb:

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

One additional remark: if the input can contain multiple stringAlias expressions on one line, the pattern should be /stringAlias="(.*?)"/ (note the question mark). You can see the difference if you match a string like

   str = "stringAlias=\"one\" bla stringAlias=\"two\""

   p str.scan( /stringAlias="(.*)"/ )
   # => [["one\" bla stringAlias=\"two"]]

   p str.scan( /stringAlias="(.*?)"/ )
   # => [["one"], ["two"]]

Regards,
Pit

First of all, thanks for all that super fast help. I've never asked a
technical question anywhere before and got such a fast response.

Specifically to Pit Capitain:
Thanks for that tip. I just googled that and learned what the .*?
does.

Pit Capitain wrote:

···

tietyt@gmail.com schrieb:
> Here's my situation, I've got this long string that contains XML. I
> would like to parse it. Specifically, I want to search this string for
> all instances of a pattern like /stringAlias="(.*)"/

One additional remark: if the input can contain multiple stringAlias
expressions on one line, the pattern should be /stringAlias="(.*?)"/
(note the question mark). You can see the difference if you match a
string like

   str = "stringAlias=\"one\" bla stringAlias=\"two\""

   p str.scan( /stringAlias="(.*)"/ )
   # => [["one\" bla stringAlias=\"two"]]

   p str.scan( /stringAlias="(.*?)"/ )
   # => [["one"], ["two"]]

Regards,
Pit