Regex question

Gavin_Sinclair · 5 August 2002 02:28

Folks,

Just a quick one.

md = /…*?$/.match "dotslash.tar.gz"
md[0] # -> “.tar.gz”

I expect md[0] to be “.gz”, because the question-mark in the regex tells *
not to be greedy. Can anyone enlighten?

Thanks,
Gavin

David_Alan_Black1 · 5 August 2002 02:37

Hello –

Folks,

Just a quick one.

md = /..*?$/.match “dotslash.tar.gz”
md[0] # → “.tar.gz”

I expect md[0] to be “.gz”, because the question-mark in the regex tells *
not to be greedy. Can anyone enlighten?

It will still find the first ‘.’ from the left, and then be
non-greedy. In other words, non-greediness affects how much gets
consumed to the right in a given match.

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

David

···

On Mon, 5 Aug 2002, Gavin Sinclair wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Bernhard_Fisseni · 5 August 2002 06:38

Hi!

md = /..*?$/.match “dotslash.tar.gz”
md[0] # → “.tar.gz”

I expect md[0] to be “.gz”, because the question-mark in the regex tells *
not to be greedy. Can anyone enlighten?
I think this happens because of leftmost matching.

“.tar.gz” is the least greedy leftmost (even though not the smallest match
at all) match.

Regards,
Bernhard

Jim_Freeze2 · 5 August 2002 02:41

or this:

/\w+$/

···

On Mon, Aug 05, 2002 at 11:37:12AM +0900, David Alan Black wrote:

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

–
Jim Freeze
If only I had something clever to say for my comment…
~

Nobuyoshi_Nakada · 5 August 2002 02:57

Hi,

···

At Mon, 5 Aug 2002 11:37:12 +0900, David Alan Black wrote:

Try this:

/.[^.]*$/

Or

str = “dotslash.tar.gz”
if md = str.rindex(/..*$/) #=> 12
str[md…-1] #=> “.gz”

–
Nobu Nakada

David_Alan_Black1 · 5 August 2002 02:46

Hi –

···

On Mon, 5 Aug 2002, Jim Freeze wrote:

On Mon, Aug 05, 2002 at 11:37:12AM +0900, David Alan Black wrote:

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

or this:

/\w+$/

(You forgot the . \w will cover the example given, though I was
generalizing a bit: matching the post-last-dot part of any line.

David

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Gavin_Sinclair · 5 August 2002 03:57

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

or this:

/\w+$/

Hmm… that’ll do! Thanks.

···

On Mon, Aug 05, 2002 at 11:37:12AM +0900, David Alan Black wrote:

–
Jim Freeze
If only I had something clever to say for my comment…
~

Jim_Freeze2 · 5 August 2002 10:28

Note that /\w+$/ will not match all extensions. E.g.,

fred.my-dashed-ext

will return ext.
David replied with my first (unpoosted) attempt.
To be sure you get everything past that last ‘.’, use:

/.([^.]*)$/.match(file)[1]

Jim

···

On Mon, Aug 05, 2002 at 12:57:30PM +0900, Gavin Sinclair wrote:

On Mon, Aug 05, 2002 at 11:37:12AM +0900, David Alan Black wrote:

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

or this:

/\w+$/

Hmm… that’ll do! Thanks.

–
Jim Freeze
If only I had something clever to say for my comment…
~

David_Alan_Black1 · 5 August 2002 10:41

Hi –

···

On Mon, 5 Aug 2002, Jim Freeze wrote:

On Mon, Aug 05, 2002 at 12:57:30PM +0900, Gavin Sinclair wrote:

On Mon, Aug 05, 2002 at 11:37:12AM +0900, David Alan Black wrote:

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

or this:

/\w+$/

Hmm… that’ll do! Thanks.

Note that /\w+$/ will not match all extensions. E.g.,

fred.my-dashed-ext

will return ext.
David replied with my first (unpoosted) attempt.
To be sure you get everything past that last ‘.’, use:

/.([^.]*)$/.match(file)[1]

I guess Gavin didn’t need to include the ‘.’ itself in the match (it
was there in the first one, but if /\w+$/ works on his example then it
must not be needed). If it’s not needed one could just do:

/[^.]*$/.match(file)[0]

David

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Mike4 · 5 August 2002 13:18

David Alan Black wrote:

Hi –

Try this:

/.[^.]*$/

which forces the match to start at the right-most ‘.’ on the line.

or this:

/\w+$/

Hmm… that’ll do! Thanks.

Note that /\w+$/ will not match all extensions. E.g.,

fred.my-dashed-ext

will return ext.
David replied with my first (unpoosted) attempt.
To be sure you get everything past that last ‘.’, use:

/.([^.]*)$/.match(file)[1]

I guess Gavin didn’t need to include the ‘.’ itself in the match (it
was there in the first one, but if /\w+$/ works on his example then it
must not be needed). If it’s not needed one could just do:

/[^.]*$/.match(file)[0]

Now wouldn’t all this be much simpler if we had a method
along the lines of ‘basename’, which would pull off the
extension (or suffix) for us? We already have File::basename
and File::dirname, so why not a File::last-part-of-the-name-name?

Or what about a ‘File::parts’ which returns an array of:
directory (or nil), root name (or nil), suffix(es) (or nil).
Would that be more Ruby-esqe?

···

On Mon, 5 Aug 2002, Jim Freeze wrote:

On Mon, Aug 05, 2002 at 12:57:30PM +0900, Gavin Sinclair wrote:

On Mon, Aug 05, 2002 at 11:37:12AM +0900, David Alan Black wrote:

–
Mike Hall
http://www.enteract.com/~mghall

David_Alan_Black1 · 5 August 2002 15:05

Hi –

···

On Mon, 5 Aug 2002, Mike Hall wrote:

David Alan Black wrote:

/[^.]*$/.match(file)[0]

Now wouldn’t all this be much simpler if we had a method
along the lines of ‘basename’, which would pull off the
extension (or suffix) for us? We already have File::basename
and File::dirname, so why not a File::last-part-of-the-name-name?

Or what about a ‘File::parts’ which returns an array of:
directory (or nil), root name (or nil), suffix(es) (or nil).
Would that be more Ruby-esqe?

Heavens – object-oriented regular expressions are Ruby-esque enough
for you?

Of course, writing add-on libraries is very Ruby-esque too, so have
at it!

David

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Gavin_Sinclair · 5 August 2002 15:13

[All sorts of people wrote all sorts of things, and then…]

/[^.]*$/.match(file)[0]

Now wouldn’t all this be much simpler if we had a method
along the lines of ‘basename’, which would pull off the
extension (or suffix) for us? We already have File::basename
and File::dirname, so why not a File::last-part-of-the-name-name?

Or what about a ‘File::parts’ which returns an array of:
directory (or nil), root name (or nil), suffix(es) (or nil).
Would that be more Ruby-esqe?

I agree there’s room for some help from the File module here. We already
have

File.split(“/home/fred/aliases.sh”) # → [“/home/fred”, “aliases.sh”]

which is nice. I think the most sensible think is for this method to take
an optional parameter, indicating whether the extension should be a separate
element. In fact, it could be an integer specifying how many dots to
include in the extension. So:

File.split(“tuesday.txt”) # → [“.”, “tuesday.txt”]
File.split(“/games/doom.tar.gz”) # → [“/games”, “doom.tar.gz”]
File.split(“/games/doom.tar.gz”, 1) # → [“/games”, “doom.tar”, “gz”]
File.split(“/games/doom.tar.gz”, 2) # → [“/games”, “doom”, “tar.gz”]
File.split(“/games/doom.tar.gz”, 3) # → [“/games”, “doom”, “tar.gz”]

Note that the first two lines represent current File.split behaviour, and
the final three are proposed extensions. The method “split” is very
appropriate here, as (and I should have thought of this earlier)
String.split can be used to chop up a filename into extensions in this
manner.

Of course, we’d need File.extension(path, n=1) to match File.dirname and
File.basename, as well.

–Gavin

···

----- Original Message -----
From: “Mike Hall” mghall@enteract.com

David_Alan_Black1 · 5 August 2002 15:10

sub(/are/, “aren’t”) #

David

···

On Tue, 6 Aug 2002, David Alan Black wrote:

Hi –

On Mon, 5 Aug 2002, Mike Hall wrote:

David Alan Black wrote:

/[^.]*$/.match(file)[0]

Now wouldn’t all this be much simpler if we had a method
along the lines of ‘basename’, which would pull off the
extension (or suffix) for us? We already have File::basename
and File::dirname, so why not a File::last-part-of-the-name-name?

Or what about a ‘File::parts’ which returns an array of:
directory (or nil), root name (or nil), suffix(es) (or nil).
Would that be more Ruby-esqe?

Heavens – object-oriented regular expressions are Ruby-esque enough
for you?

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Topic		Replies	Views
Regular expressions from the end of a string ruby-talk	1	91	1 December 2002
Regexp Error? ruby-talk	15	110	15 May 2004
Regex for splitting filenames ruby-talk	8	103	7 June 2006
Non-greedy regexp ruby-talk	3	122	12 August 2002
Mystery regexp ruby-talk	4	86	17 July 2006

Regex question

Related topics