How to include some text and exclude other in one regular expression?

I would like to match “file 1 of 2” or “1 of 2”, but not “day 1 of 2”.

I can match “file 1 of 2” with…

/(file)? *\d+ *of *\d+/

I wonder if (in the same regex), I can prevent a match to “day 1 of
2”? The above expression will match this by just ignoring the word
"day".

Can this be done?

Thank you,

  • Usano

I would like to match “file 1 of 2” or “1 of 2”, but not “day 1 of 2”.

I can match “file 1 of 2” with…

/(file)? *\d+ *of *\d+/

I wonder if (in the same regex), I can prevent a match to “day 1 of
2”? The above expression will match this by just ignoring the word
“day”.

Can this be done?

/^(file)?\s*(\d+)\sof\s(\d+)$/

\s is any white space character.
adding ^ and $ to the beginning and end of the expression make it match
the entire string. since /1 or 2/ will match “day 1 or 2” “file 1 or 2” or
“In 2002, 1 or 2 people answered this email”
/^1 or 2$/ makes it match only “1 or 2”, with nothing before or after.

Hope this helps,
Greg.

···

On Thu, 19 Dec 2002, Usano wrote:

Thank you,

  • Usano


Greg Millam
walker at deafcode.com

I think this problem can only be solved by knowing the problem domain.
In particular, one would need to know if “file 1 of 2” and “1 of 2” had
a common delimiter in front of them that is distinguishable from the
delimiter in front of “day 1 of 2” or “page 1 of 2”, etc. For example,
an “(” might be such a delimiter in some text collections.
Alternatively, one could negate occurrences of “file 1 or 2” and
inspect the resulting lines that contain “1 of 2” for words in front of
“1 of 2” (sorting and taking the unique instances would help here) and
then expressly negate those.

···

On Wednesday, December 18, 2002, at 11:34 PM, Usano wrote:

I would like to match “file 1 of 2” or “1 of 2”, but not “day 1 of 2”.

I can match “file 1 of 2” with…

/(file)? *\d+ *of *\d+/

I wonder if (in the same regex), I can prevent a match to “day 1 of
2”? The above expression will match this by just ignoring the word
“day”.

Can this be done?

Thank you,

  • Usano

Hi –

I would like to match “file 1 of 2” or “1 of 2”, but not “day 1 of 2”.

I can match “file 1 of 2” with…

/(file)? *\d+ *of *\d+/

I wonder if (in the same regex), I can prevent a match to “day 1 of
2”? The above expression will match this by just ignoring the word
“day”.

Can this be done?

/^(file)?\s*(\d+)\sof\s(\d+)$/

\s is any white space character.
adding ^ and $ to the beginning and end of the expression make it match
the entire string. since /1 or 2/ will match “day 1 or 2” “file 1 or 2” or
“In 2002, 1 or 2 people answered this email”
/^1 or 2$/ makes it match only “1 or 2”, with nothing before or after.

Not quite:

irb(main):001:0> /^1 or 2$/.match(“hi there\n1 or 2\nbye!”)
#MatchData:0x401f05e0

^ matches beginning of line and $ matches end of line. \A
matches beginning of entire string, and \z matches end of string (or
\Z, which is end of string minus terminating newline, if any).

David

···

On Thu, 19 Dec 2002, Greg Millam wrote:

On Thu, 19 Dec 2002, Usano wrote:


David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Thank you for the help (everyone). The problem domain is binary files
from Usenet. There is no one standard that all adhere to such as the
delimiter as you suggested in your response, Mark.

Also, there may be varied text before and after this sequence, so the
other two suggestions won’t work.

I suppose I will have to face the fact that what I cannot exclude
unwanted possibilities all in the same regex.

Thanks,

  • usano
···

On Thu, 19 Dec 2002 15:35:02 +0900, Mark Wilson mwilson13@cox.net wrote:

I think this problem can only be solved by knowing the problem domain.
In particular, one would need to know if “file 1 of 2” and “1 of 2” had
a common delimiter in front of them that is distinguishable from the
delimiter in front of “day 1 of 2” or “page 1 of 2”, etc. For example,
an “(” might be such a delimiter in some text collections.
Alternatively, one could negate occurrences of “file 1 or 2” and
inspect the resulting lines that contain “1 of 2” for words in front of
“1 of 2” (sorting and taking the unique instances would help here) and
then expressly negate those.

On Wednesday, December 18, 2002, at 11:34 PM, Usano wrote:

I would like to match “file 1 of 2” or “1 of 2”, but not “day 1 of 2”.

I can match “file 1 of 2” with…

/(file)? *\d+ *of *\d+/

I wonder if (in the same regex), I can prevent a match to “day 1 of
2”? The above expression will match this by just ignoring the word
“day”.

Can this be done?

Thank you,

  • Usano