Hello Amir,
why when I write "puts x[0]" ruby returns "This" and for "puts x[1]"
returns "Thi" and for "puts x[2]" returns "s" ?
The first (x[0]) is always the complete match the whole regular
expression did match. The rest are the individual sub matches, if there
are any.
One also has to know that, by default, in most implementation any
regular expression is "greedy", which means it tries to match as much
characters as possible.
So, given your first example:
"This is a test".match( /(\w+)(\w+)/ )
\w - match a a single "word" character
\w+ - match at least one *or* more "word" characters
Now since by default everything is greedy, the first \w+ tries to match
as much as possible. Since the second \w+ wants to fulfill it task too,
the first \w+ eats up already everything until the last character and
leaves that for the second \w+ .
There's a special character ? which can be used to tell a regex to be
non-greedy, try this example:
"This is a test".match( /(\w+?)(\w+)/ )
irb(main):006:0> "1234".match(/(\d+?)(\d+)/)
=> #<MatchData "1234" 1:"1" 2:"234">
The \w+? means "match as few as possible" and thus it only matches the
first "1" and leaves all the rest to the second \w+ .
In your case it's debatable whether this regex really makes sense
though; at a first glance it doesn't look like a generally useful case
and really looks very specific.
HTH
···
On 01.08.2010 09:19, Amir Ebrahimifard wrote: