Regex: get the first match

Hello!

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:

"<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)

=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

Is there any way to do this?

Thanks!

This is a FAQ, and yes I will give the solution :wink:
Regexps are gready par default, they consume as many chars as
possible, there are some possibilities - not tested:

(1) use non gready matches
"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
(2) use less general expressions
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
(3) Combine both :wink:
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)

HTH
Robert

P.S.
This *really* is a FAQ though

···

On 6/10/07, Trochalakis Christos <yatiohi@ideopolis.gr> wrote:

Hello!

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:

>> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

Is there any way to do this?

Thanks!

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:

"<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)

=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

The solution is :

"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
=> [["this is"], ["my string"]]

The regexp scope is default maximum as is possible to find.
If you use '?' character you minimze the scope.
(.*?) instead of (.*) and the </i><i> part of string don't be include into one result.

Regards,
Grzegorz Golebiowski

> Hello!
>
> I want to parse a tagged string like this: "<i>this is</i><i>my
> string</i>"
>
> i am doing:
>
> >> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> => [["this is</i><i>my string"]]
>
> What i want is a regex that will return the *first* segment that
> matches.
> in the above case -> [["this is", "my string"]]
>
> Is there any way to do this?
>
> Thanks!
>
This is a FAQ, and yes I will give the solution :wink:
Regexps are gready par default, they consume as many chars as
possible, there are some possibilities - not tested:

(1) use non gready matches
"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
(2) use less general expressions
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
(3) Combine both :wink:
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)

.Unless you want to match strings like <i><foo</i>, it would be simple to
just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If the
intent was to make the regexp not match that, a better regexp would be [^<]+

HTH

···

On 6/10/07, Robert Dober <robert.dober@gmail.com> wrote:

On 6/10/07, Trochalakis Christos <yatiohi@ideopolis.gr> wrote:
Robert

P.S.
This *really* is a FAQ though
--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

Thanks Grzegorz, nice trick!

···

On Jun 10, 3:22 pm, GrzechG <grze...@DELITgazeta.pl> wrote:

> I want to parse a tagged string like this: "<i>this is</i><i>my
> string</i>"

> i am doing:

>>> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> => [["this is</i><i>my string"]]

> What i want is a regex that will return the *first* segment that
> matches.
> in the above case -> [["this is", "my string"]]

The solution is :

"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
=> [["this is"], ["my string"]]

The regexp scope is default maximum as is possible to find.
If you use '?' character you minimze the scope.
(.*?) instead of (.*) and the </i><i> part of string don't be include
into one result.

Regards,
Grzegorz Golebiowski

Thanks for correcting my typos.
Robert

···

On 6/10/07, Logan Capaldo <logancapaldo@gmail.com> wrote:

On 6/10/07, Robert Dober <robert.dober@gmail.com> wrote:
>
> On 6/10/07, Trochalakis Christos <yatiohi@ideopolis.gr> wrote:
> > Hello!
> >
> > I want to parse a tagged string like this: "<i>this is</i><i>my
> > string</i>"
> >
> > i am doing:
> >
> > >> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> > => [["this is</i><i>my string"]]
> >
> > What i want is a regex that will return the *first* segment that
> > matches.
> > in the above case -> [["this is", "my string"]]
> >
> > Is there any way to do this?
> >
> > Thanks!
> >
> This is a FAQ, and yes I will give the solution :wink:
> Regexps are gready par default, they consume as many chars as
> possible, there are some possibilities - not tested:
>
> (1) use non gready matches
> "<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
> (2) use less general expressions
> "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
> (3) Combine both :wink:
> "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)

.Unless you want to match strings like <i><foo</i>, it would be simple to
just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If the
intent was to make the regexp not match that, a better regexp would be [^<]+

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

You are welcome :wink:
Robert

···

On 6/10/07, Trochalakis Christos <yatiohi@ideopolis.gr> wrote:

On Jun 10, 3:22 pm, GrzechG <grze...@DELITgazeta.pl> wrote:
> > I want to parse a tagged string like this: "<i>this is</i><i>my
> > string</i>"
>
> > i am doing:
>
> >>> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> > => [["this is</i><i>my string"]]
>
> > What i want is a regex that will return the *first* segment that
> > matches.
> > in the above case -> [["this is", "my string"]]
>
> The solution is :
>
> "<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
> => [["this is"], ["my string"]]
>
> The regexp scope is default maximum as is possible to find.
> If you use '?' character you minimze the scope.
> (.*?) instead of (.*) and the </i><i> part of string don't be include
> into one result.
>
> Regards,
> Grzegorz Golebiowski

Thanks Grzegorz, nice trick!

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw