Questions about * + and ? in Regex

Hi,

I have some questions related the correct meaning of * + and ? in Regex
that I would appreciate some clarification:
I have an example (derived from the Programming Ruby 2nd Edition), that
I don't understand why gives these results, here is the code:

def show_regexp(a, re)
  if a =~ re
    puts "#{$`}<<#{$&}>>#{$'}"
  else
    puts "no match"
  end
end

show_regexp('Example1', /\s*/)
show_regexp('Example2', /\s.*/)
show_regexp('Example3 ', /\s.?/) # Space at the end of string
show_regexp('Example4 ', /\s.+/) # Space at the end of string
show_regexp('Example5 ', /\s.*/) # Space at the end of string

output gives:

<<>>Example1
no match
Example3<< >>
no match
Example5<< >>

If I understand well:
     * means - match zero or more occurrences of preceding expression.
     + means - match 1 or more occurrences of preceding expression.
     ? means - match 0 or 1 occurrence of preceding expression.

Why Example2 gives "no match"? I understand this as find "0 or more
occurrences" of (a space followed by any character)
Why Example4 gives "no match"? I understand this as find "1 or more
occurrence" of (a space followed by any character)
I am assuming that the null character can be matched by a .
Am I correct?

Best Regards

···

--
Posted via http://www.ruby-forum.com/.

A dot (.) can only match an actual character. Example 2 fails because
it's looking not for "0 or more occurrences of (a space followed by
any character)", but "a space followed by 0 or more characters". The *
only applies to whatever immediately precedes it, not the whole
expression... unless the expression's enclosed in parentheses. A regex
for "0 or more occurrences of (a space followed by any character)"
would be /(\s.)*/. In that case, the * applies to the parenthesized
group of whitespace and dot.

Example 4 fails because the only space isn't followed by anything at all.

HTH,
Chris

P.S. I strongly recommend Jeffrey Friedl's Mastering Regular Expressions.

···

On Dec 30, 2007 11:25 PM, Carlos Ortega <caof2005@yahoo.com> wrote:

Hi,

I have some questions related the correct meaning of * + and ? in Regex
that I would appreciate some clarification:
I have an example (derived from the Programming Ruby 2nd Edition), that
I don't understand why gives these results, here is the code:

def show_regexp(a, re)
  if a =~ re
    puts "#{$`}<<#{$&}>>#{$'}"
  else
    puts "no match"
  end
end

show_regexp('Example1', /\s*/)
show_regexp('Example2', /\s.*/)
show_regexp('Example3 ', /\s.?/) # Space at the end of string
show_regexp('Example4 ', /\s.+/) # Space at the end of string
show_regexp('Example5 ', /\s.*/) # Space at the end of string

output gives:

<<>>Example1
no match
Example3<< >>
no match
Example5<< >>

If I understand well:
     * means - match zero or more occurrences of preceding expression.
     + means - match 1 or more occurrences of preceding expression.
     ? means - match 0 or 1 occurrence of preceding expression.

Why Example2 gives "no match"? I understand this as find "0 or more
occurrences" of (a space followed by any character)
Why Example4 gives "no match"? I understand this as find "1 or more
occurrence" of (a space followed by any character)
I am assuming that the null character can be matched by a .
Am I correct?

Best Regards
--
Posted via http://www.ruby-forum.com/\.

Thanks a lot Chris now I think I got it, however I still have the
doubt interpreting this:

show_regexp('hi hi hihihi hi hi', /\s.*?\s/)

Overall my confusion arrives when 2 special characters are together...

Cause this last would be:
-Match a space
-Followed by 0 or More characters
-Followed by ..... <= Here is my doubt
-Ending with a space.

Again I would appreciate you help on this.

Regards
Carlos

Chris Shea wrote:

···

On Dec 30, 2007 11:25 PM, Carlos Ortega <caof2005@yahoo.com> wrote:

  else
output gives:
     ? means - match 0 or 1 occurrence of preceding expression.
Posted via http://www.ruby-forum.com/\.

A dot (.) can only match an actual character. Example 2 fails because
it's looking not for "0 or more occurrences of (a space followed by
any character)", but "a space followed by 0 or more characters". The *
only applies to whatever immediately precedes it, not the whole
expression... unless the expression's enclosed in parentheses. A regex
for "0 or more occurrences of (a space followed by any character)"
would be /(\s.)*/. In that case, the * applies to the parenthesized
group of whitespace and dot.

Example 4 fails because the only space isn't followed by anything at
all.

HTH,
Chris

P.S. I strongly recommend Jeffrey Friedl's Mastering Regular
Expressions.

--
Posted via http://www.ruby-forum.com/\.

Normally "*" is "greedy" -- i.e., it matches the right-most matching
substring -- but when it's bounded by "?" it matches the left-most
(first) instance.

"Hello world, from ruby".match(/.*?\s+/)[0]
# => "Hello "

"Hello world, from ruby".match(/.*\s+/)[0]
=> "Hello world, from "

Regards,
Jordan

···

On Dec 31, 1:24 am, Carlos Ortega <caof2...@yahoo.com> wrote:

Thanks a lot Chris now I think I got it, however I still have the
doubt interpreting this:

show_regexp('hi hi hihihi hi hi', /\s.*?\s/)

Overall my confusion arrives when 2 special characters are together...

Cause this last would be:
-Match a space
-Followed by 0 or More characters
-Followed by ..... <= Here is my doubt
-Ending with a space.

Again I would appreciate you help on this.

Regards
Carlos

Chris Shea wrote:
> On Dec 30, 2007 11:25 PM, Carlos Ortega <caof2...@yahoo.com> wrote:
>> else
>> output gives:
>> ? means - match 0 or 1 occurrence of preceding expression.
>> Posted viahttp://www.ruby-forum.com/.

> A dot (.) can only match an actual character. Example 2 fails because
> it's looking not for "0 or more occurrences of (a space followed by
> any character)", but "a space followed by 0 or more characters". The *
> only applies to whatever immediately precedes it, not the whole
> expression... unless the expression's enclosed in parentheses. A regex
> for "0 or more occurrences of (a space followed by any character)"
> would be /(\s.)*/. In that case, the * applies to the parenthesized
> group of whitespace and dot.

> Example 4 fails because the only space isn't followed by anything at
> all.

> HTH,
> Chris

> P.S. I strongly recommend Jeffrey Friedl's Mastering Regular
> Expressions.

--
Posted viahttp://www.ruby-forum.com/.