Regular expression negate a word (not character)

Kenneth · 26 January 2008 01:19

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

  winter tire
  tire
  retire
  tired

but will not grep for

  snow tire
  snow tire
  some snowtires

need to do it in one regular expression

Kenneth · 26 January 2008 02:19

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

···

On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

  tire

but not

  snow tire

or

  snowtire

Mark_Tolonen · 26 January 2008 04:45

"Summercool" <Summercoolness@gmail.com> wrote in message news:27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.googlegroups.com...

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression

What you want is a negative lookbehind assertion:

re.search(r'(?<!snow)tire','snowtire') # no match
re.search(r'(?<!snow)tire','baldtire')

<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

re.search(r'(?<!snow\s*)tire','snow tire')

Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\dev\python\lib\re.py", line 134, in search
    return _compile(pattern, flags).search(string)
  File "C:\dev\python\lib\re.py", line 233, in _compile
    raise error, v # invalid expression
error: look-behind requires fixed-width pattern

Python doesn't support lookbehind assertions that can vary in size. This doesn't work either:

re.search(r'(?<!snow)\s*tire','snow tire')

<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
    for matchobj in re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
        if not re.match(notpattern,matchobj.group()):
            yield matchobj

def markexcept(pattern,notpattern,string):
substrings =
current = 0

    for matchobj in finditerexcept(pattern,notpattern,string):
        substrings.append(string[current:matchobj.start()])
        substrings.append('[' + matchobj.group() + ']')
        current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###

sample='''winter tire

... tire
... retire
... tired
... snow tire
... snow tire
... some snowtires
... '''

print markexcept('tire','snow\s*tire',sample)

winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark

Kenneth · 26 January 2008 09:54

to add to the test cases, the regular expression must be able to grep

snowbird tire
tired on a snow day
snow tire and regular tire

Paddy · 26 January 2008 11:34

Try the answer here:
[Tutor] Regex [negative lookbehind / use HTMLParser to parse HTML]

···

On Jan 26, 1:16 am, Summercool <Summercooln...@gmail.com> wrote:

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

  tire

but not

  snow tire

or

  snowtire

so for example, it will grep for

  winter tire
  tire
  retire
  tired

but will not grep for

  snow tire
  snow tire
  some snowtires

need to do it in one regular expression

Ilya_Zakharevich · 26 January 2008 21:39

[A complimentary Cc of this posting was sent to
Summercool
<Summercoolness@gmail.com>], who wrote in article <27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.googlegroups.com>:

so for example, it will grep for

  winter tire
  tire
  retire
  tired

but will not grep for

  snow tire
  snow tire
  some snowtires

This does not describe the problem completely. What about

thisnow tire
snow; tire

etc? Anyway, one of the obvious modifications of

(^ | \b(?!snow) \w+ ) \W* tire

should work.

Hope this helps,
Ilya

Gbacon · 28 January 2008 18:55

The code below at least passes your tests.

Hope it helps,
Greg

#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my @tests = (
  [ "winter tire", => MATCH ],
  [ "tire", => MATCH ],
  [ "retire", => MATCH ],
  [ "tired", => MATCH ],
  [ "snowbird tire", => MATCH ],
  [ "tired on a snow day", => MATCH ],
  [ "snow tire and regular tire", => MATCH ],
  [ " tire" => MATCH ],
  [ "snow tire" => NO_MATCH ],
  [ "snow tire" => NO_MATCH ],
  [ "some snowtires" => NO_MATCH ],
);

my $not_snow_tire = qr/
^ \s* tire |
([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
/xi;

my $fail;
for (@tests) {
  my($str,$want) = @$_;
  my $got = $str =~ /$not_snow_tire/;
  my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

···

--
... all these cries of having 'abolished slavery,' of having 'preserved the
union,' of establishing a 'government by consent,' and of 'maintaining the
national honor' are all gross, shameless, transparent cheats -- so trans-
parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"

Paul_McGuire1 · 28 January 2008 21:39

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

Too bad pyparsing's not an option. Here's what it would look like:

data = """
Match:

  winter tire
  tire
  retire
  tired

But not match:

  snow tire
  snow tire
  some snowtires

snowbird tire
tired on a snow day
snow tire and regular tire

"""

from pyparsing import CaselessLiteral,Literal,line

# caseless wasn't really necessary but you never know
# when you'll run into a "Snow tire"
snow = CaselessLiteral("snow")
tire = Literal("tire")
tire.ignore(snow + tire)

for matchTokens,matchStart,matchEnd in tire.scanString(data):
print line(matchStart, data)

Prints:

  winter tire
  tire
  retire
  tired

snowbird tire
tired on a snow day
snow tire and regular tire

-- Paul

···

On Jan 25, 7:16 pm, Summercool <Summercooln...@gmail.com> wrote:

Suraj_Kurapati1 · 29 January 2008 23:35

Since Ruby does not have a negative look *behind* operator, I just used
the negative look *ahead* in a backwards way, et viola!

puts a.reverse.gsub(/erit(?!.*wons)/, '>>>\&<<<').reverse

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

<<<tire>>>

but not

snow tire

or

snowtire

so for example, it will grep for

  winter <<<tire>>>
  <<<tire>>>
  re<<<tire>>>
  <<<tire>>>d

but will not grep for

  snow tire
  snow tire
  some snowtires

need to do it in one regular expression
=> nil

···

--
Posted via http://www.ruby-forum.com/\.

Ryan_Holmes · 6 August 2010 08:58

Hi all,

I know this is an old post, but was trying to do something similar and
came up with this:

((?=.*snow\s*tire.*)|^.*tire.*$)

Does that meet the required need? If it's full line matches you're
looking for, I think it should do the trick.

···

--
Posted via http://www.ruby-forum.com/.

JoeP · 26 January 2008 02:35

SpringFlowers AutumnMoon wrote:

snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

I took a long look at this and I came up with a number of different
methods, including an idea like the one you have above. If you have a
set number of bad/undesirable words then everything falls apart. I
tried negative look behinds but those don't work well with 0 or more
spaces because look-behinds have to have a fixed length. I really don't
think that this could be done elegantly with a single regular expression
if you have multiple bad/undesirable words. However, if you split this
into two regular expressions then it becomes rather straightforward.

I really have spent the last 20 minutes trying out different
possibilities with a single regular expressions but it just doesn't seem
worth the difficulty =(

May I ask why there is the requirement for a single regular expression?

- Joe P

···

On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Judson_Lester1 · 26 January 2008 02:43

(?!snow)(\S{4})\s*(tire)|^\S{0,3}\s*(tire)

I'm not thrilled with that, but without look-behind, it's rough to do
what you're asking.

Shameless pluggery: I used RegexpBench to do the experimentation to
find your answer.

Judson

···

On Jan 25, 2008 6:19 PM, Summercool <Summercoolness@gmail.com> wrote:

On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

--
Your subnet is currently 169.254.0.0/16. You are likely to be eaten by a grue.

Bearophilehugs · 26 January 2008 10:49

Summercool:

to add to the test cases, the regular expression must be able to grep
snow tire and regular tire

I presume there only the second tire has to be found.

This is my first try:

text = """
tire
word tire word
word retire word
word tired word
snowbird tire word
tired on a snow day word
snow tire and regular tire word
word snow tire word
word snow tire word
word some snowtires word
"""

import re

def finder(text):
    patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
    for mo in patt.finditer(text):
        if not mo.group(1).endswith("snow"):
            yield mo.start(2)

for end in finder(text):
print end

The (lazy) output is the starting point of the "tire" that match:

1
11
28
43
63
73
120

Bye,
bearophile

Bearophilehugs · 26 January 2008 11:54

Paddy:

Try the answer here:
[Tutor] Regex [negative lookbehind / use HTMLParser to parse HTML]

But in the OP problem there can be variable-sized spaces in the
middle...

Bye,
bearophile

Dr.Ruud · 28 January 2008 20:14

Greg Bacon schreef:

#! /usr/bin/perl

use warnings;
use strict;

use constant {
  MATCH => 1,
  NO_MATCH => 0,
};

my @tests = (
  [ "winter tire", => MATCH ],
  [ "tire", => MATCH ],
  [ "retire", => MATCH ],
  [ "tired", => MATCH ],
  [ "snowbird tire", => MATCH ],
  [ "tired on a snow day", => MATCH ],
  [ "snow tire and regular tire", => MATCH ],
  [ " tire" => MATCH ],
  [ "snow tire" => NO_MATCH ],
  [ "snow tire" => NO_MATCH ],
  [ "some snowtires" => NO_MATCH ],
);
[...]

I negated the test, to make the regex simpler:

my $snow_tire = qr/
snow [[:blank:]]* tire (?!.*tire)
/x;

my $fail;
for (@tests) {
  my($str,$want) = @$_;
  my $got = $str !~ /$snow_tire/;
  my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

···

--
Affijn, Ruud

"Gewoon is een tijger."

Gbacon · 29 January 2008 17:14

In article <fnlfr0.1fk.1@news.isolution.nl>,

···

Dr.Ruud <rvtol+news@isolution.nl> wrote:

: I negated the test, to make the regex simpler: [...]

Yes, your approach is simpler. I assumed from the "need it all
in one pattern" constraint that the OP is feeding the regular
expression to some other program that is looking for matches.

I dunno. Maybe it was the familiar compulsion with Perl to
attempt to cram everything into a single pattern.

Greg
--
What light is to the eyes -- what air is to the lungs -- what love is to
the heart, liberty is to the soul of man.
-- Robert Green Ingersoll

Daniel_DeLorme · 30 January 2008 01:32

I think I have a solution that matches the OP's request

tests = ["winter tire", "tire", "retire", "tired", "snowbird tire", "tired on a snow day", "snow tire and regular tire", " tire", "snow tire", "snow tire", "some snowtires"]
m,nm = tests.partition{ |str| str =~ /\A(?>snow *tire|.)*tire/ }
p m
=> ["winter tire", "tire", "retire", "tired", "snowbird tire", "tired on a snow day", "snow tire and regular tire", " tire"]
p nm
=> ["snow tire", "snow tire", "some snowtires"]

How is that?

Daniel

Andrew_Stewart · 30 January 2008 15:49

Aha! I like your style.

James Edward Gray's when-all-else-fails-reverse-the-data triumphs again.

Regards,
Andy Stewart

···

On 29 Jan 2008, at 23:35, Suraj Kurapati wrote:

Since Ruby does not have a negative look *behind* operator, I just used
the negative look *ahead* in a backwards way, et viola!

puts a.reverse.gsub(/erit(?!.*wons)/, '>>>\&<<<').reverse

-------

Kenneth · 26 January 2008 03:30

thanks for your post. a reason is that some text editor lets users
search all files using a regular expression... another reason is
that... if 2 lines are used to test... then what if that line actually
has tire and snowtire... then it may negate the whole line as a
result, even though we want to grep it due to the first word "tire".

···

On Jan 25, 6:35 pm, Joseph Pecoraro <joepec...@gmail.com> wrote:

I really have spent the last 20 minutes trying out different
possibilities with a single regular expressions but it just doesn't seem
worth the difficulty =(

May I ask why there is the requirement for a single regular expression?

- Joe P

James_Edward_Gray_II · 30 January 2008 15:58

I love that trick.

James Edward Gray II

···

On Jan 30, 2008, at 9:49 AM, Andrew Stewart wrote:

On 29 Jan 2008, at 23:35, Suraj Kurapati wrote:

Since Ruby does not have a negative look *behind* operator, I just used
the negative look *ahead* in a backwards way, et viola!

puts a.reverse.gsub(/erit(?!.*wons)/, '>>>\&<<<').reverse

Aha! I like your style.

James Edward Gray's when-all-else-fails-reverse-the-data triumphs again.

Topic		Replies	Views
Grep and regular expressions in ruby ruby-talk	14	142	24 April 2003
Negative grep ruby-talk	13	107	14 January 2012
Help with regular expression ruby-talk	16	98	30 June 2007
Negate a character sequence in a regular expression? ruby-talk	11	131	2 December 2007
Regex ruby-talk	13	368	26 June 2016

Regular expression negate a word (not character)

Related topics