Regexp: why does (re)* return only last repetition?

While trying to build an RE to parse a shell-style regexp into an array of
non-wild, wild, non-wild, wild, etc I found (again) that the grouping
operator (), when followed by *, returns only the last match into the
MatchData:

i can’t answer your question, but it work this way in perl too.

str = ‘foobar?baz’
regex = Regexp.new('([
?]|(?:[^?]+))’, Regexp::EXTENDED);

matches = regex.match(str)
p matches[1…(matches.length-1)]

yields:

[“baz”]

Annoying. I wanted [“foo”, “*”, “bar”, “?”, “baz”].
How to do this most simply?

'foobar?baz’.scan /[?]|[^*?]+/

=> [“foo”, “*”, “bar”, “?”, “baz”]

-a

···

On Fri, 9 May 2003, Clifford Heath wrote:

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

ahoward wrote:

'foobar?baz’.scan /[?]|[^*?]+/

Magic, thanks. Must read that book again :-).

New problem: Dir.glob. I’m finding the following misbehaviour:

  1. If the pattern contains a space in a directory name, a matching
    directory doesn’t.

  2. Placing ‘?’ in a pattern can result in no matches. I did this
    as a work-around to the previous problem, where the filenames
    contain multiple spaces, the pattern multiple ?s.

  3. Using the space character in the filename part of a pattern can
    cause two complete sets of matches, i.e. the filename array
    has a complete second copy appended. I worked around it with uniq!

Are these known problems with 1.6.8 on Linux?

What’s wierd is that Perl’s glob on Windows (both the built-in one
and the File::DosGlob module) are broken also, each in different
ways. Coincidence? Why is this stuff hard to get right?

Clifford.