irb(main):004:0> /(theone)?/.match(" theone").to_a
=> ["", nil]
? means 'zero or one'
we start a the beginning of ' theone' and instantly find a match: zero of
them.
irb(main):003:0> /(theone)?/.match("theone").to_a
=> ["theone", "theone"]
same here.
irb(main):005:0> / (theone)?/.match(" theone").to_a
=> [" theone", "theone"]
same here.
remember regexp engines work (well, some of them) by staring at a position and
consuming chars while the pattern matches, iff all the pattern was used we
have a positive match, otherwise not. so in all these cases we start like so
' theone'
^
ptr
and drive with the regexp asking "does the regexp match starting here? if so
how many chars did it consume" the consumed chars are returned in $1, $2,
etc. in all the cases above this explains the matching.
note that some regexp engines work in the reverse sense but the effect is
largely the same...
In the first case, it doesn't match "theone", but in the second and third it
does...
so it matched in all cases -- sometimes zero times, sometimes one time. this
is what you asked the regexp to do. i try to follow these rules when
composing regexps:
- always use anchors ^ and $
- never use anything that can match 'zero' things
it's the 'zero' thing that suprised you. your first two regexps match even
the empty string!
obviously this is not always possible but i will maintain this:
if you create a regexp without anchors and with portions that can match zero
things and have not done so out of absolute need - your code has a bug.
kind regards.
-a
···
On Tue, 29 Jun 2004, Kristof Bastiaensen wrote:
--
EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen
===============================================================================