Newlines included in bracket negation

(... that subject probably makes no sense ...)

Anyway, I have some unexpected (to me) behavior in the following regexp.
This example is contrived, but based on a real need. Can anyone explain why
the result is multi-line, even though the re is not?

require 'test/unit'

class TestRE < Test::Unit::TestCase
  def test_newlines
    src = "happy\n\nbirthday"
    assert_equal("hday", src.scan(/h[^x]*?day/).to_s)
  end
end

produces

Finished in 0.031 seconds.

  1) Failure:
test_newlines_consumed_in_not_section(TestRE) ...
<"hday"> expected but was
<"happy\n\nbirthday">.

···

--
Chris
http://clabs.org

Adding \n inside the brackets fixes it, I just wouldn't expect to have to do
this since I didn't add the multiline mode option.

require 'test/unit'

class TestRE < Test::Unit::TestCase
  def test_newlines
    src = "happy\n\nbirthday"
    assert_equal("hday", src.scan(/h[^x\n]*?day/).to_s)
  end
end

···

--
Chris
http://clabs.org

Chris Morris wrote:

(... that subject probably makes no sense ...)

Anyway, I have some unexpected (to me) behavior in the following regexp.
This example is contrived, but based on a real need. Can anyone explain
why
the result is multi-line, even though the re is not?

require 'test/unit'

class TestRE < Test::Unit::TestCase
  def test_newlines
    src = "happy\n\nbirthday"
    assert_equal("hday", src.scan(/h[^x]*?day/).to_s)
  end
end

produces

Finished in 0.031 seconds.

  1) Failure:
test_newlines_consumed_in_not_section(TestRE) ...
<"hday"> expected but was
<"happy\n\nbirthday">.

Can anyone explain why
the result is multi-line, even though the re is not?

It's not a question of the re being multi-line or not, it's a question
of the re being greedy v. non-greedy. But because there is only one
match for your regex, the issue of greedy v. non-greedy is irrelevant.

If you think about it, there is really no concept of 'lines' with
regards to text. There really is only one line--one, long, continuous
line of characters. Some of those characters might be '\n' characters,
and we may choose to interpret a '\n' as a new line, but that doesn't
change the fact that there is still just one continuous string of
characters. A regex has nothing inherently programmed into it that will
cause it to stop looking for matches when a '\n' is encountered in the
sequence of characters. The regex character '.' will stop searching
at a newline, but that is not true of regex's generally. In any case,
you do not use the '.' character in your regex, so that behavior is
irrelevant.

···

--
Posted via http://www.ruby-forum.com/\.

There's also something I don't understand, similar to the above.
I always thought that in a non-multiline regexp, the dot didn't match
newlines (\n), so I don't understand this:

irb(main):036:0> re = /(h)(.*)(day)/
=> /(h)(.*)(day)/
irb(main):037:0> "happy\n\nbirthday".match(re).captures
=> ["h", "", "day"]
irb(main):038:0> re = /(h)(.*)(day)/m
=> /(h)(.*)(day)/m
irb(main):039:0> "happy\n\nbirthday".match(re).captures
=> ["h", "appy\n\nbirth", "day"]

I thought the first case wouldn't match.
Can anyone shed some light?

Jesus.

···

On 10/26/07, Chris Morris <the.chrismo@gmail.com> wrote:

Adding \n inside the brackets fixes it, I just wouldn't expect to have to do
this since I didn't add the multiline mode option.

require 'test/unit'

class TestRE < Test::Unit::TestCase
  def test_newlines
    src = "happy\n\nbirthday"
    assert_equal("hday", src.scan(/h[^x\n]*?day/).to_s)
  end
end

Can you check my example above? I'm using a greedy match of .* which I
thought would match up to a \n in a non-multiline regexp, and would
include everything in a multiline one. I must be confused at some
point :frowning:

Jesus.

···

On 10/26/07, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

The regex character '.' will stop searching
at a newline, but that is not true of regex's generally. In any case,
you do not use the '.' character in your regex, so that behavior is
irrelevant.

from memory, 'multiline' affects *only* the behavior of '.' in res the re

   [^x] => 'not x'

simply matches any char that is not 'x' - including newline

it's the same in perl and python iirc

cheers.

a @ http://codeforpeople.com/

···

On Oct 26, 2007, at 3:30 PM, Chris Morris wrote:

Adding \n inside the brackets fixes it, I just wouldn't expect to have to do
this since I didn't add the multiline mode option.

require 'test/unit'

class TestRE < Test::Unit::TestCase
  def test_newlines
    src = "happy\n\nbirthday"
    assert_equal("hday", src.scan(/h[^x\n]*?day/).to_s)
  end
end

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

The last four characters of the word "birthday" match the regexp /
h.*day/, without crossing any newlines. Perhaps you were thinking of /
h.+day/, which does not match.

···

On Oct 26, 3:43 pm, "Jesús Gabriel y Galán" <jgabrielyga...@gmail.com> wrote:

There's also something I don't understand, similar to the above.
I always thought that in a non-multiline regexp, the dot didn't match
newlines (\n), so I don't understand this:

irb(main):036:0> re = /(h)(.*)(day)/
=> /(h)(.*)(day)/
irb(main):037:0> "happy\n\nbirthday".match(re).captures
=> ["h", "", "day"]
irb(main):038:0> re = /(h)(.*)(day)/m
=> /(h)(.*)(day)/m
irb(main):039:0> "happy\n\nbirthday".match(re).captures
=> ["h", "appy\n\nbirth", "day"]

I thought the first case wouldn't match.
Can anyone shed some light?

Yeah, it behaves that way. I guess I need to adjust my expectations :slight_smile:

···

On 10/26/07, ara.t.howard <ara.t.howard@gmail.com> wrote:

from memory, 'multiline' affects *only* the behavior of '.' in res
the re

   [^x] => 'not x'

simply matches any char that is not 'x' - including newline

it's the same in perl and python iirc

--
Chris
http://clabs.org

I need more sleep, for sure. I was of course thinking on the first "h"
and the last "day". That explains it :slight_smile:

irb(main):043:0> "happy\n\nday".match(re).captures
NoMethodError: undefined method `captures' for nil:NilClass

Thanks,

Jesus.

···

On 10/27/07, Phrogz <phrogz@mac.com> wrote:

On Oct 26, 3:43 pm, "Jesús Gabriel y Galán" <jgabrielyga...@gmail.com> > wrote:

> There's also something I don't understand, similar to the above.
> I always thought that in a non-multiline regexp, the dot didn't match
> newlines (\n), so I don't understand this:
>
> irb(main):036:0> re = /(h)(.*)(day)/
> => /(h)(.*)(day)/
> irb(main):037:0> "happy\n\nbirthday".match(re).captures
> => ["h", "", "day"]
> irb(main):038:0> re = /(h)(.*)(day)/m
> => /(h)(.*)(day)/m
> irb(main):039:0> "happy\n\nbirthday".match(re).captures
> => ["h", "appy\n\nbirth", "day"]
>
> I thought the first case wouldn't match.
> Can anyone shed some light?

The last four characters of the word "birthday" match the regexp /
h.*day/, without crossing any newlines. Perhaps you were thinking of /
h.+day/, which does not match.

Jesús Gabriel y Galán wrote:

> => ["h", "", "day"]
h.+day/, which does not match.

I need more sleep, for sure. I was of course thinking on the first "h"
and the last "day". That explains it :slight_smile:

A clue was in the capture results:

rb(main):036:0> re = /(h)(.*)(day)/
=> /(h)(.*)(day)/
irb(main):037:0> "happy\n\nbirthday".match(re).captures
=> ["h", "", "day"]

The fact that the (.*) matched nothing was an indication that something
was amiss.

···

On 10/27/07, Phrogz <phrogz@mac.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Nothing amiss there at all. the * is match "zero or more times" and so it is perfectly fine to match zero occurrences of any character (except newline) between the 'h' and the 'day'

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Oct 26, 2007, at 7:23 PM, 7stud -- wrote:

Jesús Gabriel y Galán wrote:

On 10/27/07, Phrogz <phrogz@mac.com> wrote:

=> ["h", "", "day"]

h.+day/, which does not match.

I need more sleep, for sure. I was of course thinking on the first "h"
and the last "day". That explains it :slight_smile:

A clue was in the capture results:

rb(main):036:0> re = /(h)(.*)(day)/
=> /(h)(.*)(day)/
irb(main):037:0> "happy\n\nbirthday".match(re).captures
=> ["h", "", "day"]

The fact that the (.*) matched nothing was an indication that something
was amiss.

Jesús Gabriel y Galán wrote:

I was of course thinking on the first "h"
and the last "day".

Rob Biedenharn wrote:

Nothing amiss there at all.

Ok.

···

--
Posted via http://www.ruby-forum.com/\.