Anyway, I have some unexpected (to me) behavior in the following regexp.
This example is contrived, but based on a real need. Can anyone explain why
the result is multi-line, even though the re is not?
require 'test/unit'
class TestRE < Test::Unit::TestCase
def test_newlines
src = "happy\n\nbirthday"
assert_equal("hday", src.scan(/h[^x]*?day/).to_s)
end
end
produces
Finished in 0.031 seconds.
1) Failure:
test_newlines_consumed_in_not_section(TestRE) ...
<"hday"> expected but was
<"happy\n\nbirthday">.
Anyway, I have some unexpected (to me) behavior in the following regexp.
This example is contrived, but based on a real need. Can anyone explain
why
the result is multi-line, even though the re is not?
require 'test/unit'
class TestRE < Test::Unit::TestCase
def test_newlines
src = "happy\n\nbirthday"
assert_equal("hday", src.scan(/h[^x]*?day/).to_s)
end
end
produces
Finished in 0.031 seconds.
1) Failure:
test_newlines_consumed_in_not_section(TestRE) ...
<"hday"> expected but was
<"happy\n\nbirthday">.
Can anyone explain why
the result is multi-line, even though the re is not?
It's not a question of the re being multi-line or not, it's a question
of the re being greedy v. non-greedy. But because there is only one
match for your regex, the issue of greedy v. non-greedy is irrelevant.
If you think about it, there is really no concept of 'lines' with
regards to text. There really is only one line--one, long, continuous
line of characters. Some of those characters might be '\n' characters,
and we may choose to interpret a '\n' as a new line, but that doesn't
change the fact that there is still just one continuous string of
characters. A regex has nothing inherently programmed into it that will
cause it to stop looking for matches when a '\n' is encountered in the
sequence of characters. The regex character '.' will stop searching
at a newline, but that is not true of regex's generally. In any case,
you do not use the '.' character in your regex, so that behavior is
irrelevant.
There's also something I don't understand, similar to the above.
I always thought that in a non-multiline regexp, the dot didn't match
newlines (\n), so I don't understand this:
Can you check my example above? I'm using a greedy match of .* which I
thought would match up to a \n in a non-multiline regexp, and would
include everything in a multiline one. I must be confused at some
point
Jesus.
···
On 10/26/07, 7stud -- <bbxx789_05ss@yahoo.com> wrote:
The regex character '.' will stop searching
at a newline, but that is not true of regex's generally. In any case,
you do not use the '.' character in your regex, so that behavior is
irrelevant.
The last four characters of the word "birthday" match the regexp /
h.*day/, without crossing any newlines. Perhaps you were thinking of /
h.+day/, which does not match.
···
On Oct 26, 3:43 pm, "Jesús Gabriel y Galán" <jgabrielyga...@gmail.com> wrote:
There's also something I don't understand, similar to the above.
I always thought that in a non-multiline regexp, the dot didn't match
newlines (\n), so I don't understand this:
I need more sleep, for sure. I was of course thinking on the first "h"
and the last "day". That explains it
irb(main):043:0> "happy\n\nday".match(re).captures
NoMethodError: undefined method `captures' for nil:NilClass
Thanks,
Jesus.
···
On 10/27/07, Phrogz <phrogz@mac.com> wrote:
On Oct 26, 3:43 pm, "Jesús Gabriel y Galán" <jgabrielyga...@gmail.com> > wrote:
> There's also something I don't understand, similar to the above.
> I always thought that in a non-multiline regexp, the dot didn't match
> newlines (\n), so I don't understand this:
>
> irb(main):036:0> re = /(h)(.*)(day)/
> => /(h)(.*)(day)/
> irb(main):037:0> "happy\n\nbirthday".match(re).captures
> => ["h", "", "day"]
> irb(main):038:0> re = /(h)(.*)(day)/m
> => /(h)(.*)(day)/m
> irb(main):039:0> "happy\n\nbirthday".match(re).captures
> => ["h", "appy\n\nbirth", "day"]
>
> I thought the first case wouldn't match.
> Can anyone shed some light?
The last four characters of the word "birthday" match the regexp /
h.*day/, without crossing any newlines. Perhaps you were thinking of /
h.+day/, which does not match.
Nothing amiss there at all. the * is match "zero or more times" and so it is perfectly fine to match zero occurrences of any character (except newline) between the 'h' and the 'day'