A regex problem

gga · 21 August 2005 07:16

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Thorsten_Haude · 21 August 2005 07:49

Hi,

* gga wrote (2005-08-21 09:16):

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Just by looking at it, this only seems to not-find 'Mrs..'.

I also wonder why you use a look-ahead, I would rather use a
look-behind. As it is, your regex would find any dot, because no dot
matches (Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.). So in the regex dialect I know
best (NEdit):
(?<!(Jr|Sr|Miss|Mr|Mrs))\.
(aka. find a dot not preceeded by Jr, Sr, etc.)

Thorsten

···

--
Gerade wenn wir alle ganz sichergehen wollen, schaffen
wir eine Welt voll äußerster Unsicherheit
- Dag Hammarskjöld

W_James · 21 August 2005 09:11

gga wrote:

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

W_James · 21 August 2005 10:16

William James wrote:

gga wrote:
> I am usually pretty good at regexes but this one has me stumped.
> I want to basically match any line that has a period in it, but only if
> that period is not part of a salutation. Ideally I want to do this
> with a single regex.
>
> Thus:
> 'end of line. And we continue' # should match
> 'The incredible Mrs. Robner' # should not match
> 'Sammy Davis Jr. is an okay guy.' # should match, due to last .
>
> I tried doing something logical, like:
>
> /(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./
>
> but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

This would be a lot easier if Ruby had look-behind.

[
'.start',
'-. HERE .-',
'Jr. is rotten',
'Mr. Smith is here',
'Mr-. Smith is here',
'Mr. Smith is here.',
'Mrs. Jones left',
'Meet Mr. Elihu Snark, Jr.',
'A good line.',
'A mystery guest, introduced by his father, Mr. Bob Eck, Sr.'
].each {|s|
  if s =~ %r{ (?:
                   (?!Jr|Sr|Mr) ^ .{0,2} |
                   (?!.Jr|.Sr|.Mr|Mrs) ...
               )
               \.
            }x
    puts s
  end
}

Gavin_Kistner2 · 21 August 2005 14:41

A negative look-behind would be the perfect, simple approach to this regex problem. Unfortunately, Ruby's current regexp handler does not have such a feature. Fortunately, the regexp handler of the next version of Ruby does. Even more fortunately, this future handler (Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current case, or you can get Oniguruma working and use a negative look-behind.

···

On Aug 21, 2005, at 1:49 AM, Thorsten Haude wrote:

I also wonder why you use a look-ahead, I would rather use a
look-behind.

Robert · 21 August 2005 16:51

I'd probably use something like /(\w+)\./ and do a programmatic check (or use a second RX) that the word before the dot is not one of those no match words.

Kind regards

robert

···

Gavin Kistner <gavin@refinery.com> wrote:

On Aug 21, 2005, at 1:49 AM, Thorsten Haude wrote:

I also wonder why you use a look-ahead, I would rather use a
look-behind.

A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature. Fortunately, the regexp handler of the next
version of Ruby does. Even more fortunately, this future handler
(Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current
case, or you can get Oniguruma working and use a negative look-behind.

Thorsten_Haude · 21 August 2005 17:35

Hi,

* Gavin Kistner wrote (2005-08-21 16:41):

···

On Aug 21, 2005, at 1:49 AM, Thorsten Haude wrote:

I also wonder why you use a look-ahead, I would rather use a
look-behind.

A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature.

Sorry if I added to the confusion, I'm pretty new to Ruby and wasn't
aware of that limitation.

Thorsten
--
A: Top posters
Q: What's the most annoying thing about email these days?

Topic		Replies	Views
Mystery regexp ruby-talk	4	86	17 July 2006
Regexp exclusion search - find matches NOT ending with a string? ruby-talk	9	146	18 July 2009
Pattern match to fail if two periods in a row ruby-talk	9	110	28 November 2009
Help with regular expression ruby-talk	16	98	30 June 2007
Odd regexp behavior ruby-talk	15	139	12 August 2011

A regex problem

Related topics