Multiple matching with ()*

Alessandro_Re · 31 July 2007 13:34

Hi there!
I'm Alessandro from Italy and I started using ruby some days ago,
so... Hello, Community!

Well, I was trying to match a pattern multiple times. I tried both
with normal match() and scan(), but i can't get the desired result.

The subject string is something like:
"1a2bend" or "beg1a2b3c4dend"
more generally, it should match /^beg(\d\w)*end$/ : always a begin and
ending pattern, and a unspecified number of central pattern.
The problem is that the central pattern must be extracted for every
time it's encountered.
For example, trying with
"x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
returns
[["x", "4D", "z"]]
while i need something like
[["x", "1A", "2B", "3C", "4D", "z"]]

Why does ()* match just the last one? How can i get all the ()* that it matches?

Probabily i'm doing something wrong, but can't understand where :\

Thanks!

···

--
~Ale

Jan_Svitok · 31 July 2007 13:48

Try:

if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/

return [

···

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:

Hi there!
I'm Alessandro from Italy and I started using ruby some days ago,
so... Hello, Community!

Well, I was trying to match a pattern multiple times. I tried both
with normal match() and scan(), but i can't get the desired result.

The subject string is something like:
"1a2bend" or "beg1a2b3c4dend"
more generally, it should match /^beg(\d\w)*end$/ : always a begin and
ending pattern, and a unspecified number of central pattern.
The problem is that the central pattern must be extracted for every
time it's encountered.
For example, trying with
"x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
returns
[["x", "4D", "z"]]
while i need something like
[["x", "1A", "2B", "3C", "4D", "z"]]

Why does ()* match just the last one? How can i get all the ()* that it matches?

Probabily i'm doing something wrong, but can't understand where :\

Jan_Svitok · 31 July 2007 13:57

Try:

if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/
a, b = $1, $3 #
return [a] + $2.scan(/\d\w/).flatten + [b]
end

I don't know if it's possible to do it in one run though, maybe you
could use split as well...
Take care when doing nested searches as they will overwrite $1..9
(that's why I used a and b)

J.

···

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:

Hi there!
I'm Alessandro from Italy and I started using ruby some days ago,
so... Hello, Community!

Well, I was trying to match a pattern multiple times. I tried both
with normal match() and scan(), but i can't get the desired result.

The subject string is something like:
"1a2bend" or "beg1a2b3c4dend"
more generally, it should match /^beg(\d\w)*end$/ : always a begin and
ending pattern, and a unspecified number of central pattern.
The problem is that the central pattern must be extracted for every
time it's encountered.
For example, trying with
"x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
returns
[["x", "4D", "z"]]
while i need something like
[["x", "1A", "2B", "3C", "4D", "z"]]

Why does ()* match just the last one? How can i get all the ()* that it matches?

Probabily i'm doing something wrong, but can't understand where :\

Harry3 · 31 July 2007 14:01

Hi,

Try this.

str = "x1A2B3C4Dz"
p str.scan(/\d?\w/) #>["x", "1A", "2B", "3C", "4D", "z"]

Harry

···

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:

For example, trying with
"x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
returns
[["x", "4D", "z"]]
while i need something like
[["x", "1A", "2B", "3C", "4D", "z"]]

--
A Look into Japanese Ruby List in English

Wolfgang_Nadasi-Donn · 31 July 2007 21:41

Alessandro Re wrote:

For example, trying with
"x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
returns
[["x", "4D", "z"]]
while i need something like
[["x", "1A", "2B", "3C", "4D", "z"]]

Does this goes more into the direction you wanted:

irb(main):001:0> "x1A2B3C4Dz".scan
/(?:^(?:x)|\G)(\d\w)(?=(?:\d\w)*(?:z)$)/
=> [["1A"], ["2B"], ["3C"], ["4D"]]

???

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Alessandro_Re · 31 July 2007 14:09

Thanks, but i need to match the pattern OR don't match anything.
"lol1a2vasd".scan(/\d?\w/) => ["l", "o", "l", "1a", "2v", "a", "s", "d"]
while i need to be sure that the pattern begins with a regex "x" and
ends with "z"

(of course, x 1 a 2 b 3 c should be regexes not just chars)

thanks, you help is apreciated

···

On 7/31/07, Harry Kakueki <list.push@gmail.com> wrote:

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:
> For example, trying with
> "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
> returns
> [["x", "4D", "z"]]
> while i need something like
> [["x", "1A", "2B", "3C", "4D", "z"]]
>
Hi,

Try this.

str = "x1A2B3C4Dz"
p str.scan(/\d?\w/) #>["x", "1A", "2B", "3C", "4D", "z"]

Harry

--
A Look into Japanese Ruby List in English
http://www.kakueki.com/

--
~Ale

Alessandro_Re · 31 July 2007 14:11

Mh well, to me it seems a normal regex processing (i mean, it *should*
require only one instruction, since this pattern can be read with just
one regex, even if ruby doesn't allow it... but it would be really
bad).
Anyway well, splitting it there are different ways to do it - thanks
for your sudjestion.
But if ruby make it possible with one call, i'd prefer to use it.

Thanks!

···

On 7/31/07, Jano Svitok <jan.svitok@gmail.com> wrote:

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:
> Hi there!
> I'm Alessandro from Italy and I started using ruby some days ago,
> so... Hello, Community!
>
> Well, I was trying to match a pattern multiple times. I tried both
> with normal match() and scan(), but i can't get the desired result.
>
> The subject string is something like:
> "1a2bend" or "beg1a2b3c4dend"
> more generally, it should match /^beg(\d\w)*end$/ : always a begin and
> ending pattern, and a unspecified number of central pattern.
> The problem is that the central pattern must be extracted for every
> time it's encountered.
> For example, trying with
> "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
> returns
> [["x", "4D", "z"]]
> while i need something like
> [["x", "1A", "2B", "3C", "4D", "z"]]
>
> Why does ()* match just the last one? How can i get all the ()* that it matches?
>
> Probabily i'm doing something wrong, but can't understand where :\

Try:

if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/
a, b = $1, $3 #
return [a] + $2.scan(/\d\w/).flatten + [b]
end

I don't know if it's possible to do it in one run though, maybe you
could use split as well...
Take care when doing nested searches as they will overwrite $1..9
(that's why I used a and b)

J.

--
~Ale

Harry3 · 31 July 2007 14:32

str = "lol1a2vasd"
p str.scan(/\d\w|\w{3}/)

Harry

···

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:

Thanks, but i need to match the pattern OR don't match anything.
"lol1a2vasd".scan(/\d?\w/) => ["l", "o", "l", "1a", "2v", "a", "s", "d"]
while i need to be sure that the pattern begins with a regex "x" and
ends with "z"

--
A Look into Japanese Ruby List in English

Robert_K1 · 31 July 2007 14:56

irb(main):006:0> s="x1A2B3C4Dz"
=> "x1A2B3C4Dz"
irb(main):007:0> s.scan /x(\d\w)*z/
=> [["4D"]]
irb(main):008:0> s.scan /x((?:\d\w)*?)z/
=> [["1A2B3C4D"]]
irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
=> [["1A", "2B", "3C", "4D"]]

Kind regards

robert

···

2007/7/31, Alessandro Re <akirosspower@gmail.com>:

Mh well, to me it seems a normal regex processing (i mean, it *should*
require only one instruction, since this pattern can be read with just
one regex, even if ruby doesn't allow it... but it would be really
bad).
Anyway well, splitting it there are different ways to do it - thanks
for your sudjestion.
But if ruby make it possible with one call, i'd prefer to use it.

botp1 · 31 July 2007 15:23

seems like you have a pattern within a pattern.
it may be easy to unwrap outer pattern first, then work on the inner
pattern. something like,

irb(main):096:0> "lol1a2vasd".scan(/lol(.+)asd/).to_s.scan(/\d\w/)
=> ["1a", "2v"]
irb(main):097:0> "beg1a2vend".scan(/beg(.+)end/).to_s.scan(/\d\w/)
=> ["1a", "2v"]
irb(main):098:0> "beg1a2vendxbeg3c4dend".scan(/beg(.+)end/).to_s.scan(/\d\w/)
=> ["1a", "2v", "3c", "4d"]

is that ok?
kind regards -botp

···

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:

Mh well, to me it seems a normal regex processing (i mean, it *should*
require only one instruction, since this pattern can be read with just
one regex, even if ruby doesn't allow it... but it would be really bad).

Harry3 · 31 July 2007 23:49

Sorry, I misunderstood what you wanted.
Is this more like it?

str = "lol1a2vasd"
m = /^(\w{3})(.*)(\w{3})$/.match(str).captures
m[1] = m[1].scan(/\d\w/)
p m.flatten #> ["lol","1a","2v","asd"]

Harry

···

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:

while i need to be sure that the pattern begins with a regex "x" and
ends with "z"

(of course, x 1 a 2 b 3 c should be regexes not just chars)

--
A Look into Japanese Ruby List in English

Alessandro_Re · 31 July 2007 15:18

Thanks, this is an interesting solution!

···

On 7/31/07, Robert Klemme <shortcutter@googlemail.com> wrote:

2007/7/31, Alessandro Re <akirosspower@gmail.com>:
> Mh well, to me it seems a normal regex processing (i mean, it *should*
> require only one instruction, since this pattern can be read with just
> one regex, even if ruby doesn't allow it... but it would be really
> bad).
> Anyway well, splitting it there are different ways to do it - thanks
> for your sudjestion.
> But if ruby make it possible with one call, i'd prefer to use it.

irb(main):006:0> s="x1A2B3C4Dz"
=> "x1A2B3C4Dz"
irb(main):007:0> s.scan /x(\d\w)*z/
=> [["4D"]]
irb(main):008:0> s.scan /x((?:\d\w)*?)z/
=> [["1A2B3C4D"]]
irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
=> [["1A", "2B", "3C", "4D"]]

Kind regards

robert

--
~Ale

Alessandro_Re · 2 August 2007 09:23

Yep, it's like this.
I solved using 2 instructions as you did: first matching extern words,
then the middle ones, but i still think that one regex would have been
nicer

Thanks guys

···

On 8/1/07, Harry Kakueki <list.push@gmail.com> wrote:

On 7/31/07, Alessandro Re <akirosspower@gmail.com> wrote:
> while i need to be sure that the pattern begins with a regex "x" and
> ends with "z"
>
> (of course, x 1 a 2 b 3 c should be regexes not just chars)
>
Sorry, I misunderstood what you wanted.
Is this more like it?

str = "lol1a2vasd"
m = /^(\w{3})(.*)(\w{3})$/.match(str).captures
m[1] = m[1].scan(/\d\w/)
p m.flatten #> ["lol","1a","2v","asd"]

Harry

--
A Look into Japanese Ruby List in English
http://www.kakueki.com/

--
~Ale

Robert_K1 · 1 August 2007 22:04

Give special attention to my usage of the reluctant qualifier which is mandatory if your input contains multiple begin end pairs.

Kind regards

robert

PS: please do not top post.

···

On 31.07.2007 17:18, Alessandro Re wrote:

Thanks, this is an interesting solution!

On 7/31/07, Robert Klemme <shortcutter@googlemail.com> wrote:

2007/7/31, Alessandro Re <akirosspower@gmail.com>:

Mh well, to me it seems a normal regex processing (i mean, it *should*
require only one instruction, since this pattern can be read with just
one regex, even if ruby doesn't allow it... but it would be really
bad).
Anyway well, splitting it there are different ways to do it - thanks
for your sudjestion.
But if ruby make it possible with one call, i'd prefer to use it.

irb(main):006:0> s="x1A2B3C4Dz"
=> "x1A2B3C4Dz"
irb(main):007:0> s.scan /x(\d\w)*z/
=> [["4D"]]
irb(main):008:0> s.scan /x((?:\d\w)*?)z/
=> [["1A2B3C4D"]]
irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
=> [["1A", "2B", "3C", "4D"]]

Wolfgang_Nadasi-Donn · 2 August 2007 10:19

Alessandro Re wrote:

...but i still think that one regex would have been nicer

I don't think, that this will be "nice"...

irb(main):001:0>
"x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
=> [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

..., and I didn't test it aganst wrong lines, but after a "flatten" it
ends up with the required result.

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Alessandro_Re · 4 August 2007 10:12

Wonderful
Thanks!

···

On 8/2/07, Wolfgang Nádasi-donner <ed.odanow@wonado.de> wrote:

irb(main):001:0>
"x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
=> [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

--
~Ale

Robert_K1 · 6 August 2007 06:32

But this does not seem to work with strings that contain multiple sections:

irb(main):002:0>
"x1A2B3C4Dz1a".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
=>

So it's not suited for a one RX approach and still need two levels of
RX. If that's the case then we have seen simpler solutions for that.
(Btw, one reason why it's so awkward is that there is no lookbehind in
Ruby 1.8 - but this will change.)

Kind regards

robert

···

2007/8/4, Alessandro Re <akirosspower@gmail.com>:

On 8/2/07, Wolfgang Nádasi-donner <ed.odanow@wonado.de> wrote:
> irb(main):001:0>
> "x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
> => [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

Wonderful
Thanks!

Wolfgang_Nadasi-Donn · 6 August 2007 06:51

Robert Klemme wrote:

(Btw, one reason why it's so awkward is that there is no lookbehind in
Ruby 1.8 - but this will change.)

I am waiting for this Christmas gift too...

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
[newbie] How do I match a regex multiple times? ruby-talk	2	118	3 April 2004
Regex Help Please ruby-talk	5	85	20 December 2002
Match a pattern multiple times, returning matches, captures and offset? ruby-talk	9	152	8 April 2011
Do You Understand Regular Expressions? ruby-talk	19	112	22 June 2007
Match/scan does not return multiple matches ruby-talk	11	154	9 February 2010

Multiple matching with ()*

Related topics