Because /^/ matches the beginning of a line (not the beginning of
the string), and /\s/ matches whitespace, which includes newlines (\n).
So the first place in the string where the beginning of a line is
followed by one or more whitespaces is at position 8.
(If I put in only 2 newlines, it's fine).
With only two newlines, the /$/ prevents the match, since there are
"b"s following the newline.
Because /^/ matches the beginning of a line (not the beginning of
the string), and /\s/ matches whitespace, which includes newlines (\n).
So the first place in the string where the beginning of a line is
followed by one or more whitespaces is at position 8.
Thanks for the reactions to all.
It's not that simple: ^ _also_ matches the beginning of the string.
Perl does _not_
produce a match, unless you suffix the regular expression with m.
Cheers,
Han
···
On Apr 6, 2005 4:58 PM, Warren Brown <warrenb@timevision.com> wrote:
And so it's worth pointing out that in Ruby you should write:
str.untaint if str =~ /\A[a-z0-9]*\z/ # good
and not:
str.untaint if str =~ /^[a-z0-9]*$/ # HIGHLY DANGEROUS
It means that these sorts of regexp are a bit less readable than Perl's.
Regards,
Brian.
···
On Thu, Apr 07, 2005 at 03:47:15PM +0900, Han Holl wrote:
On Apr 6, 2005 4:58 PM, Warren Brown <warrenb@timevision.com> wrote:
> Han,
[ cut ]
>
> Because /^/ matches the beginning of a line (not the beginning of
> the string), and /\s/ matches whitespace, which includes newlines (\n).
> So the first place in the string where the beginning of a line is
> followed by one or more whitespaces is at position 8.
>
Thanks for the reactions to all.
It's not that simple: ^ _also_ matches the beginning of the string.
Perl does _not_
produce a match, unless you suffix the regular expression with m.
Which leaves the question: what is the meaning if the m suffix in ruby ?
It would seem that multi-line is on by default, with no means to switch it off.
Ruby should not be different from the other RE engines with no good reason.
Cheers,
Han Holl
···
On Apr 7, 2005 10:05 AM, Brian Candler <B.Candler@pobox.com> wrote:
And so it's worth pointing out that in Ruby you should write:
str.untaint if str =~ /\A[a-z0-9]*\z/ # good
and not:
str.untaint if str =~ /^[a-z0-9]*$/ # HIGHLY DANGEROUS
It means that these sorts of regexp are a bit less readable than Perl's.
This is from man perlre:
m Treat string as multiple lines. That is, change "^" and "$"
from matching the start or end of the string to matching then
start or end of any line anywhere within the string.
This should go on the page I've seen somewhere with gotchas.
Perl RE is quite widespread, and when ruby deviates from it it's
easy to trip up.
Cheers,
Han Holl
···
On Apr 7, 2005 12:34 PM, David A. Black > The /m suffix means that \n is included in . (dot). > Yes, looked it up in the Pickaxe, and indeed that's what it says.
On Apr 7, 2005 12:34 PM, David A. Black >> The /m suffix means that \n is included in . (dot). >> > Yes, looked it up in the Pickaxe, and indeed that's what it says.
This is from man perlre:
m Treat string as multiple lines. That is, change "^" and "$"
from matching the start or end of the string to matching then
start or end of any line anywhere within the string.
This should go on the page I've seen somewhere with gotchas.
Perl RE is quite widespread, and when ruby deviates from it it's
easy to trip up.
Kind of interesting, and mightily adding to the confusion: Pickaxe2
calls the m option: 'multi-line mode', dot matches newline.
Jeffrey E. F. Friedl, in the content page of the Mastering book:
Dot-matches-all match mode (a.k.a., "single-line mode").
He calls multi-line mode the different interpretation of ^ and $.
Does anyone know if there is, or has been, a reason why Ruby chooses
to be different from the rest (Perl, Python, PHP, Apache to name a few).
Cheers,
Han
···
On Apr 7, 2005 3:45 PM, David A. Black <dblack@wobblini.net> wrote:
I don't think they ever have been, at least not in the treatment of
all this line-ending stuff (and maybe a few other things).
/m Treat string as multiple lines. That is, change "^" and "$" from
matching the start or end of the string to matching the start or
end of any line anywhere within the string.
[Ruby has this mode always enabled; you have to use \A and \z to match just
start and end of string. Perl has these too, but they're rarely used]
/s Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.
[That's the same as Ruby's /m modifier, just to make things confusing]
Regards,
Brian.
···
On Fri, Apr 08, 2005 at 12:39:02AM +0900, Han Holl wrote:
On Apr 7, 2005 3:45 PM, David A. Black <dblack@wobblini.net> wrote:
> I don't think they ever have been, at least not in the treatment of
> all this line-ending stuff (and maybe a few other things).
>
> David
Kind of interesting, and mightily adding to the confusion: Pickaxe2
calls the m option: 'multi-line mode', dot matches newline.
Jeffrey E. F. Friedl, in the content page of the Mastering book:
Dot-matches-all match mode (a.k.a., "single-line mode").
He calls multi-line mode the different interpretation of ^ and $.