A regex problem

(gga) #1

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

(Thorsten Haude) #2

Hi,

* gga wrote (2005-08-21 09:16):

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Just by looking at it, this only seems to not-find 'Mrs..'.

I also wonder why you use a look-ahead, I would rather use a
look-behind. As it is, your regex would find any dot, because no dot
matches (Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.). So in the regex dialect I know
best (NEdit):
    (?<!(Jr|Sr|Miss|Mr|Mrs))\.
(aka. find a dot not preceeded by Jr, Sr, etc.)

Thorsten

···

--
Gerade wenn wir alle ganz sichergehen wollen, schaffen
wir eine Welt voll äußerster Unsicherheit
    - Dag Hammarskjöld

(W. James) #3

gga wrote:

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
  puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

(W. James) #4

William James wrote:

gga wrote:
> I am usually pretty good at regexes but this one has me stumped.
> I want to basically match any line that has a period in it, but only if
> that period is not part of a salutation. Ideally I want to do this
> with a single regex.
>
> Thus:
> 'end of line. And we continue' # should match
> 'The incredible Mrs. Robner' # should not match
> 'Sammy Davis Jr. is an okay guy.' # should match, due to last .
>
> I tried doing something logical, like:
>
> /(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./
>
> but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
  puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

This would be a lot easier if Ruby had look-behind.

[
'.start',
'-. HERE .-',
'Jr. is rotten',
'Mr. Smith is here',
'Mr-. Smith is here',
'Mr. Smith is here.',
'Mrs. Jones left',
'Meet Mr. Elihu Snark, Jr.',
'A good line.',
'A mystery guest, introduced by his father, Mr. Bob Eck, Sr.'
].each {|s|
  if s =~ %r{ (?:
                   (?!Jr|Sr|Mr) ^ .{0,2} |
                   (?!.Jr|.Sr|.Mr|Mrs) ...
               )
               \.
            }x
    puts s
  end
}

(Gavin Kistner) #5

A negative look-behind would be the perfect, simple approach to this regex problem. Unfortunately, Ruby's current regexp handler does not have such a feature. Fortunately, the regexp handler of the next version of Ruby does. Even more fortunately, this future handler (Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current case, or you can get Oniguruma working and use a negative look-behind.

···

On Aug 21, 2005, at 1:49 AM, Thorsten Haude wrote:

I also wonder why you use a look-ahead, I would rather use a
look-behind.

(Robert) #6

I'd probably use something like /(\w+)\./ and do a programmatic check (or use a second RX) that the word before the dot is not one of those no match words.

Kind regards

    robert

···

Gavin Kistner <gavin@refinery.com> wrote:

On Aug 21, 2005, at 1:49 AM, Thorsten Haude wrote:

I also wonder why you use a look-ahead, I would rather use a
look-behind.

A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature. Fortunately, the regexp handler of the next
version of Ruby does. Even more fortunately, this future handler
(Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current
case, or you can get Oniguruma working and use a negative look-behind.

(Thorsten Haude) #7

Hi,

* Gavin Kistner wrote (2005-08-21 16:41):

···

On Aug 21, 2005, at 1:49 AM, Thorsten Haude wrote:

I also wonder why you use a look-ahead, I would rather use a
look-behind.

A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature.

Sorry if I added to the confusion, I'm pretty new to Ruby and wasn't
aware of that limitation.

Thorsten
--
A: Top posters
Q: What's the most annoying thing about email these days?