Non-greedy regexp

Tom_Robinson1 · 12 August 2002 18:28

Hi,

The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”

The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?

Is there a better alternative to doing url parsing by hand?

Thanks

···

–
tom@alkali.spamfree.org
remove ‘spamfree.’ to respond

David_Alan_Black1 · 12 August 2002 18:38

Hello –

Hi,

The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”

The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?

You’re missing the notion of a leftmost match. The regex engine reads
from left to right, so to speak, in looking for the ‘/’. It finds it
in the sixth character. Then it does what you ask: namely, look for
‘.html’ at the end of the line.

To do what you were trying to do, try this:

“http://www.x.com/y/z.html”.sub(%r|/[^/]+/?.html$|, ‘’)
“http://www.x.com/y”

That also finds the leftmost match – but in this case, the leftmost
match doesn’t start until the last ‘/’ (because none of the other
'/'s, even though they’re further left, allow the rest of the match to
succeed).

David

···

On Tue, 13 Aug 2002, Tom Robinson wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Mauricio_Fernndez · 12 August 2002 18:40

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/[^/]+.html$|,‘’)
“http://www.x.com/y”

···

On Tue, Aug 13, 2002 at 03:28:26AM +0900, Tom Robinson wrote:

Hi,

The following regexp is supposed to chop off the last / of a string
and all characters following it, but it seems to be ignoring the
non-greedy indicator (?):

irb(main):001:0> “http://www.x.com/y/z.html”.sub(%r|/.+?.html$|, ‘’)
“http:”

The expected result should be “http://www.x.com/y”. I thought this
was a bug but perl produces the same result, so what am I missing?

Is there a better alternative to doing url parsing by hand?

–
_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Never trust an operating system you don’t have sources for.
– Unknown source

David_Alan_Black1 · 12 August 2002 18:56

Hi –

···

On Tue, 13 Aug 2002 dblack@candle.superlink.net wrote:

To do what you were trying to do, try this:

“http://www.x.com/y/z.html”.sub(%r|/[^/]+/?.html$|, ‘’)
“http://www.x.com/y”

Whoops, having seen Mauricio’s I now see a meaningless “/?” has
slipped into mine

David

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Topic		Replies	Views
Making my Regex less greedy? ruby-talk	5	117	5 September 2005
Regex Non Greedy Match ruby-talk	3	159	29 August 2012
Regular expression seems broken not greedy when it should be ruby-talk	2	126	21 September 2004
Problem with trivial regular expression ruby-talk	9	129	23 December 2009
Regex and non-greedy matching? ruby-talk	4	128	7 April 2008

Non-greedy regexp

Related topics