I am wondering if this is the correct behaviour in gsub:
“bab”.gsub(/(?!a)ab/, “cd”)
=> “bab”
shouldn’t that be “bcd”?
I think /(?!a)ab/ can’t match anything. It’s saying that the first
character after the beginning of the match must not be “a”, and the
first character of the match must be “a”. This is contradictory.
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
Kristof
···
On Wed, 12 May 2004 08:15:28 +0900, Joel VanderWerf wrote:
Kristof Bastiaensen wrote:
Hi,
I am wondering if this is the correct behaviour in gsub:
“bab”.gsub(/(?!a)ab/, “cd”)
=> “bab”
shouldn’t that be “bcd”?
I think /(?!a)ab/ can’t match anything. It’s saying that the first
character after the beginning of the match must not be “a”, and the
first character of the match must be “a”. This is contradictory.
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
I think that it’s the intended behavior. Just use /(?!a).b/ if you want
to consume the character.
Thinking about this, it is indeed possible to implement fixed-width
look-behind – interesting.
I am wondering if this is the correct behaviour in gsub:
“bab”.gsub(/(?!a)ab/, “cd”)
=> “bab”
shouldn’t that be “bcd”?
I think /(?!a)ab/ can’t match anything. It’s saying that the first
character after the beginning of the match must not be “a”, and the
first character of the match must be “a”. This is contradictory.
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume the
character, it effectively is the character ‘before’ the match (if
any).
/(?!a)/ doesn’t match or consume any character; it refers to the state
of things between characters. The previous character (or start of
string) has come and gone; the assertion, now, is “what lies just
ahead is not ‘a’”.
The other behaviour wouldn’t make sense, because (?!a)b is then
exactly the same as b.
Assertions like this always have the possibility of being redundant –
for example:
/(?=a)abc/ # same as /abc/
but there are a lot of cases where they aren’t, and that’s where they
become useful:
/David (?!Black)(\S+)/ # grab another David’s last name
David
···
On Wed, 12 May 2004 08:15:28 +0900, Joel VanderWerf wrote:
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
I think that it’s the intended behavior. Just use /(?!a).b/ if you want
to consume the character.
Hi,
You are right, I looked it up in the manual, and there it was. The
term zero-width-look-ahead pretty much says it all. I must have
gotten the definition all wrong.
Thinking about this, it is indeed possible to implement fixed-width
look-behind – interesting.
I was thinking more about something like variable-width look-between
Meaning for example a(?^\w+)b would match any a(.)b if (.) is
not equal to (\w+)
Regards,
Florian Gross
Thanks,
Kristof
···
On Wed, 12 May 2004 02:12:19 +0200, Florian Gross wrote:
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
I think that it’s the intended behavior. Just use /(?!a).b/ if you want
to consume the character.
I’d use /[^a]b/ if I wanted to consume the character. No need for
negative lookahead here.
At Wed, 12 May 2004 09:13:51 +0900,
Florian Gross wrote in [ruby-talk:99884]:
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
I think that it’s the intended behavior. Just use /(?!a).b/ if you want
to consume the character.
Thinking about this, it is indeed possible to implement fixed-width
look-behind – interesting.
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
I think that it’s the intended behavior. Just use /(?!a).b/ if you
want
to consume the character.
Hi,
You are right, I looked it up in the manual, and there it was. The
term zero-width-look-ahead pretty much says it all. I must have
gotten the definition all wrong.
Thinking about this, it is indeed possible to implement fixed-width
look-behind – interesting.
I was thinking more about something like variable-width look-between
Meaning for example a(?^\w+)b would match any a(.)b if (.) is
not equal to (\w+)
IMHO that’s not generally possible with regular expressions. You’ll
always have to define positively things that should match. Exclusion
character classes are just a means of convenience but this does not extend
to complete (sub) expressions.
For example: to match a.*a where the part in the middle does not contain
only b’s (i.e. matches /b+/) you can do:
That is exactly what I needed. And I saw it has negative
look behind also. (?<!subexp)
I think this is especially usefull in String#gsub, so you don’t
have to subgroup the context, and replicate it in the
substitution.
Kristof
···
On Thu, 13 May 2004 20:56:55 +0900, nobu.nokada wrote:
Hi,
At Wed, 12 May 2004 09:13:51 +0900,
Florian Gross wrote in [ruby-talk:99884]:
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn’t mean the same
character, but consecutive ones. Because it doesn’t consume
the character, it effectively is the character ‘before’ the
match (if any). The other behaviour wouldn’t make sense,
because (?!a)b is then exactly the same as b.
I think that it’s the intended behavior. Just use /(?!a).b/ if you want
to consume the character.
Thinking about this, it is indeed possible to implement fixed-width
look-behind – interesting.