Can't get subgroup of regex to repeat with +... what the?

Jon · 16 May 2007 20:26

I'm trying to match these kinds of malformatted xml tags. I'm beginning
to question my sanity, so i'm posting here.

Example Strings:

···

=====
A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"

I've come up with this regex:

/<(\w+?)(?:\s(\w+)=(\w+))+>/

But when matching string B from above:

md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)

It will do this:

md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0
nil
nil
nil
nil

Why isn't the final + sign making the pattern "(?:\s(\w+)=(\w+))"
repeat?

As an exercise... /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???

--
Posted via http://www.ruby-forum.com/.

Wolfgang_Nadasi-Don1 · 16 May 2007 20:57

Jon wrote:

B=" <orderMsg type=7 size=0>"
...
/<(\w+?)(?:\s(\w+)=(\w+))+>/
...
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
with "size" and "0". The groups will be overwritten each time the "+"
will repeat the group.

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Harry3 · 17 May 2007 01:38

Hi,

Unless you really want to write one regular expression for it all, you
could do something like this.

Split on spaces, then on '=' . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1..-1].each {|f| p f.split("=")}

Harry

···

On 5/17/07, Jon <exabrial@gmail.com> wrote:

Example Strings:

A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"

I've come up with this regex:

/<(\w+?)(?:\s(\w+)=(\w+))+>/

But when matching string B from above:

md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)

Why isn't the final + sign making the pattern "(?:\s(\w+)=(\w+))"
repeat?

As an exercise... /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???

--
Posted via http://www.ruby-forum.com/\.

--

A Look into Japanese Ruby List in English

Jon · 16 May 2007 20:59

Wolfgang Nádasi-donner wrote:

Jon wrote:

B=" <orderMsg type=7 size=0>"
...
/<(\w+?)(?:\s(\w+)=(\w+))+>/
...
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
with "size" and "0". The groups will be overwritten each time the "+"
will repeat the group.

Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

···

--
Posted via http://www.ruby-forum.com/\.

Harry3 · 17 May 2007 04:22

Sorry for the double post.
This is a little cleaner and easier, I think.

C.strip.delete("<>").split(/\s/).each {|f| p f.split("=")}

Harry

···

On 5/17/07, Harry Kakueki <list.push@gmail.com> wrote:

On 5/17/07, Jon <exabrial@gmail.com> wrote:
>
> Example Strings:
> =====
> A=" <orderMsg biz=0>"
> B=" <orderMsg type=7 size=0>"
> C=" <orderMsg type=7 size=0 biz=1>"
> =====

Unless you really want to write one regular expression for it all, you
could do something like this.

Split on spaces, then on '=' . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1..-1].each {|f| p f.split("=")}

Harry

--

A Look into Japanese Ruby List in English

Wolfgang_Nadasi-Don1 · 16 May 2007 21:34

Jon Fi wrote:

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

I would do it somehow like:

========== code ==========
texts = [ "<orderMsg biz=0>",
"<orderMsg type=7 size=0>",
"<orderMsg type=7 size=0 biz=1>"]

texts.each do |txt|
  if (md=txt.match(/<(\w+?)((?:\s\w+=\w+)+)>/))
    puts "\nkey '#{md[1]}' found"
    md[2].scan(/\s(\w+)=(\w+)/) do |k, v|
      puts " parameter '#{k}' has value '#{v}'"
    end
  else
    puts "+++ no match for '#{txt}'"
  end
end
========= result =========
key 'orderMsg' found
  parameter 'biz' has value '0'

key 'orderMsg' found
parameter 'type' has value '7'
parameter 'size' has value '0'

key 'orderMsg' found
  parameter 'type' has value '7'
  parameter 'size' has value '0'
  parameter 'biz' has value '1'
========== end ===========

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 18 May 2007 07:55

Wolfgang Nádasi-donner wrote:

Jon wrote:

B=" <orderMsg type=7 size=0>"
...
/<(\w+?)(?:\s(\w+)=(\w+))+>/
...
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is with "size" and "0". The groups will be overwritten each time the "+" will repeat the group.

Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing values for the group?

You can't.

Or is there a better way to do this?

Probably. I am not sure what you are up to but you can use a two stage approach like this:

texts = [
   " <orderMsg biz=0>",
   " <orderMsg type=7 size=0>",
   " <orderMsg type=7 size=0 biz=1>",
]

texts.each do |t|
p t
md = /<([^\s>]+)((?:\s+\w+=\d+)*)/.match t

   if md
     tag = md[1]
     attrs = md[2]

puts tag

     attrs.scan(/(\w+)=(\d+)/) do |m|
       print m[0], "=>", m[1], "\n"
     end
   end
end

Kind regards

robert

···

On 16.05.2007 22:59, Jon Fi wrote:

Harry3 · 18 May 2007 11:21

If you want to use regular expressions, try 'scan'.

c=" <orderMsg type=7 size=0 biz=1>"
c.scan(/\w+=?\w+/).each {|f| p f.split("=")}

Modify the regular expression as necessary.

Harry

···

On 5/17/07, Jon Fi <exabrial@gmail.com> wrote:

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

--

A Look into Japanese Ruby List in English

Jon · 18 May 2007 16:10

Harry Kakueki wrote:

···

On 5/17/07, Jon Fi <exabrial@gmail.com> wrote:

If you want to use regular expressions, try 'scan'.

c=" <orderMsg type=7 size=0 biz=1>"
c.scan(/\w+=?\w+/).each {|f| p f.split("=")}

Modify the regular expression as necessary.

Harry

Brilliant. Exactly what i was looking for.

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Regular Expression problems ruby-talk	5	114	5 July 2002
Regexp: why does (re)* return only last repetition? ruby-talk	20	179	13 May 2003
Match/scan does not return multiple matches ruby-talk	11	165	9 February 2010
DRY fanatics? ruby-talk	14	109	24 October 2006
Regular expressions, capture repeated groups ruby-talk	4	141	8 July 2010

Can't get subgroup of regex to repeat with +... what the?

===== A=" <orderMsg biz=0>" B=" <orderMsg type=7 size=0>" C=" <orderMsg type=7 size=0 biz=1>"

I've come up with this regex:

/<(\w+?)(?:\s(\w+)=(\w+))+>/

But when matching string B from above:

md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)

It will do this:

md[0]=<orderMsg type=7 size=0> md[1]=orderMsg md[2]=size md[3]=0 nil nil nil nil

Example Strings:

A=" <orderMsg biz=0>" B=" <orderMsg type=7 size=0>" C=" <orderMsg type=7 size=0 biz=1>"

I've come up with this regex:

/<(\w+?)(?:\s(\w+)=(\w+))+>/

But when matching string B from above:

md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)

Related topics

=====
A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"

md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0
nil
nil
nil
nil

A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"