Regex help

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /’.*?’/

But then I also want it to match single-quoted strings which have embedded
single quotes.
To embed, you can escape a single quote by putting another single quote just
before it.
For example :

m = /’.*?’/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one
match})
puts m[0] # ^ ^
# These are two single quotes and not a
double quote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

Please help.
TIA

– shanko

You’re doing a minimal match. Use:

/‘.‘/ instead of /’.?’/

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.11.14 at 01.58.50

···

On Thu, 14 Nov 2002 15:39:09 +0900, Shashank Date wrote:

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

quote1_re = /‘((?:’‘|[^’])+)'/

quote1_re.match( “Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’.” ).to_a[1]
==>[“two ‘‘quoted’’ matches”]

“Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’.”.scan( quote1_re )
==>[[“two ‘‘quoted’’ matches”], [“here’'s the second”]]

_why

···

Shashank Date (sdate@kc.rr.com) wrote:

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

But then I also want it to match single-quoted strings which have embedded
single quotes.
m = /'.*?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one

Shashank Date wrote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

Just remove the question mark from the regexp, and it will work(at least
for me), the question mark makes the .* non greedy…

Idan.

Answering my own post … should have tried harder b4 posting.

m = /'.''.?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with
‘two’ matches})
puts m[0]

Sorry for the noise !

“Shashank Date” sdate@kc.rr.com wrote in message
news:qmHA9.17372$jj5.440511@twister.rdc-kc.rr.com

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

But then I also want it to match single-quoted strings which have embedded
single quotes.
To embed, you can escape a single quote by putting another single quote
just
before it.
For example :

m = /'.*?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one
match})
puts m[0] # ^ ^
# These are two single quotes and not
a

···

double quote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

Please help.
TIA

– shanko

Since this type of regex question comes up regularly, I wonder how difficult
it would be to provide a regex extension which matches pairs of quoted characters;
either identical chars like ’ or " or character pairs () or {}, etc…

···

On Thu, Nov 14, 2002 at 03:39:09PM +0900, Shashank Date wrote:

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

But then I also want it to match single-quoted strings which have embedded
single quotes.
To embed, you can escape a single quote by putting another single quote just
before it.
For example :

m = /'.*?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one
match})
puts m[0] # ^ ^
# These are two single quotes and not a
double quote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’


Alan Chen
Digikata LLC
http://digikata.com

Like this solution the most ! Thanks _why !!

“why the lucky stiff” ruby-talk@whytheluckystiff.net wrote in message
news:20021114071035.GA4625@rysa.inetz.com

"Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’ ".scan(
quote1_re )
==>[ [“two ‘‘quoted’’ matches”], [“here’'s the second”]]

Now I would like to modify it to generate the following:

“Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’ one.”.scan(
???)
==> [[“Here”],[“I”],[“have”],[“two ‘‘quoted’’ matches”], [“and”],[“here’'s
the second”],[“one.”]]

“Shashank Date” sdate@kc.rr.com wrote in message
news:PTHA9.17397$jj5.446958@twister.rdc-kc.rr.com

Answering my own post … should have tried harder b4 posting.

Again !

m = /'.''.?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with
‘two’ matches})
puts m[0]

This solution is flawed (time to take some sleep) … please see solution by
“why the lucky stiff”

[snip]

quote1_re = /((?:‘’|[^'])+)/
[snip]

TESTING REGEX ((?:‘’|[^'])+)

    '        did not match        -
    ''       matched           ('')
    '''      matched           ('')
    ''''     matched         ('''')
    'a       matched            (a)
    a'       matched            (a)
    ''a      matched          (''a)
    a''      matched          (a'')
    '''a     matched           ('')
    a'''     matched          (a'')
    ''''a    matched        (''''a)
    a''''    matched        (a'''')

TESTING REGEX (?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

    '        did not match        -
    ''       matched           ('')
    '''      did not match        -
    ''''     matched         ('''')
    'a       did not match        -
    a'       did not match        -
    ''a      matched           ('')
    a''      matched           ('')
    '''a     did not match        -
    a'''     did not match        -
    ''''a    matched         ('''')
    a''''    matched         ('''')

THE PROGRAM

#!/usr/bin/env ruby

regexs = [
%r{((?:‘’|[^‘])+)},
%r{(?:^|[^’])(‘(?:[^’]|‘’)*‘)(?:[^’]|$)}
]
strings = %w(
’ ‘’ ‘’’ ‘’‘’ ‘a a’
‘‘a a’’ ‘’‘a a’‘’ ‘’‘‘a a’’‘’
)
regexs.each do |r|
puts “\nTESTING REGEX #{r.source}\n\n”
strings.each do |s|
m = r.match s
format = “\t%-8.8s %-13.13s %8.8s\n”
if m
printf format, s, ‘matched’, “(#{m[1]})”
else
printf format, s, ‘did not match’, ‘-’
end
end
end

(?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

  • beginning with the start of string or not a ’
  • followed by a ’
  • followed by zero or more of not ’ or ‘’
  • followed by a ’
  • ending with the end of string or not a ’
  • the quoted string is captured, nothing else is

-ara

···

On Thu, 14 Nov 2002, why the lucky stiff wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Since this type of regex question comes up regularly, I wonder how
difficult
it would be to provide a regex extension which matches pairs of
quoted characters;
either identical chars like ’ or " or character pairs () or {},
etc…

Those aren’t (mathematically) “regular” then, are they? (Granted,
existing ‘regular’ expressions aren’t either with backreferencing.)

···

=====

Yahoo IM: michael_s_campbell


Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

A simpler regexp to gather characters around the quoted string:

quote1_re = /((?:‘’|[^'])+)/
str = “Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’.”

str.scan( quote1_re )
==>[[“Here I have “], [“two ‘‘quoted’’ matches”], [” and “],
[“here’'s the second”], [”.”]]

For the outcome you’d like:

quote1_re = /(?:‘((?:’‘|[^’])+)'|(\S+))/

str.scan( quote1_re ).flatten.compact
==>[“Here”, “I”, “have”, “two ‘‘quoted’’ matches”, “and”,
“here’'s the second”, “.”]

Definitely a lot slower. You could probably reduce it further, but it’s
enough to get you started. Goo luck.

_why

···

Shashank Date (sdate@kc.rr.com) wrote:

Now I would like to modify it to generate the following:

“Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’ one.”.scan(
???)
==> [[“Here”],[“I”],[“have”],[“two ‘‘quoted’’ matches”], [“and”],[“here’'s
the second”],[“one.”]]

in fact they are regular because the matching char is always known : it is
either the same, or a match to a known mate. check out the oreily regex book,
it’s a really good read and has many examples of using dfa and nfa regexes to
match, for example, matching comment paris (/* */) which is exactly like this
problem

now, it would be non-regular if you tried to make a single regular expression
which would match ALL occurances mentioned since backreferencing would be
required - maybe that’s what you mean? i do agree that this type of method
should not belong in a Regex class since a general method would appear to be
non-regular, even if the collection of regexs where individually regular

i think a String#delimscan method would be good, something like :

class String
@@delimscan_chars =
{
‘'’ => ‘'’,
‘"’ => ‘"’,
‘(’ => ‘)’,
‘[’ => ‘]’,
‘{’ => ‘}’,
‘<’ => ‘>’,
}
def delimscan delim=‘"’, escape=‘\’
o = delim[0]
c = @@delimscan_chars[delim]
c = c ? c[0] : o
e = escape[0]

w = []
b = 0
a = 0
n = 0

while ((b = self[n])) do
  case b
when e
  n += 2 and next
when o
  a = n
  n += 1
  while ((b = self[n])) do
    case b
      when e
	n += 2 and next
      when c
	w << self[a..n]
	break
    end
    n += 1
  end
else;
  end
  n += 1
end

return w

end
end

if $0 == FILE
strings = [
%q(“one”),
%q(“one” “two”),
%q(“o"ne” “t"wo”),
%q(“),
%q(”“),
%q(”“”),
%q(),
].each do |s|
puts “TESTING #{s}”
puts s.delimscan.inspect
end

strings = [
%q(),
%q(<o! <t!>wo>),
%q(<),
%q(<>),
%q(<>>),
%q(),
].each do |s|
puts “TESTING #{s}”
puts s.delimscan(‘<’, ‘!’).inspect
end
end

-a

···

On Fri, 15 Nov 2002, Michael Campbell wrote:

Those aren’t (mathematically) “regular” then, are they? (Granted,
existing ‘regular’ expressions aren’t either with backreferencing.)

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Nice !

Learnt a lot from your examples …
Thanks !

“ahoward” ahoward@fsl.noaa.gov wrote in message

···

On Thu, 14 Nov 2002, why the lucky stiff wrote:

[snip]

quote1_re = /((?:‘’|[^'])+)/
[snip]

TESTING REGEX ((?:‘’|[^'])+)

    '        did not match        -
    ''       matched           ('')
    '''      matched           ('')
    ''''     matched         ('''')
    'a       matched            (a)
    a'       matched            (a)
    ''a      matched          (''a)
    a''      matched          (a'')
    '''a     matched           ('')
    a'''     matched          (a'')
    ''''a    matched        (''''a)
    a''''    matched        (a'''')

TESTING REGEX (?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

    '        did not match        -
    ''       matched           ('')
    '''      did not match        -
    ''''     matched         ('''')
    'a       did not match        -
    a'       did not match        -
    ''a      matched           ('')
    a''      matched           ('')
    '''a     did not match        -
    a'''     did not match        -
    ''''a    matched         ('''')
    a''''    matched         ('''')

THE PROGRAM

#!/usr/bin/env ruby

regexs = [
%r{((?:‘’|[^‘])+)},
%r{(?:^|[^’])(‘(?:[^’]|‘’)*‘)(?:[^’]|$)}
]
strings = %w(
’ ‘’ ‘’’ ‘’‘’ ‘a a’
‘‘a a’’ ‘’‘a a’‘’ ‘’‘‘a a’’‘’
)
regexs.each do |r|
puts “\nTESTING REGEX #{r.source}\n\n”
strings.each do |s|
m = r.match s
format = “\t%-8.8s %-13.13s %8.8s\n”
if m
printf format, s, ‘matched’, “(#{m[1]})”
else
printf format, s, ‘did not match’, ‘-’
end
end
end

(?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

  • beginning with the start of string or not a ’
  • followed by a ’
  • followed by zero or more of not ’ or ‘’
  • followed by a ’
  • ending with the end of string or not a ’
  • the quoted string is captured, nothing else is

-ara

“why the lucky stiff” ruby-talk@whytheluckystiff.net wrote in message
news:20021114155331.GA7742@rysa.inetz.com

For the outcome you’d like:

quote1_re = /(?:‘((?:’‘|[^’])+)'|(\S+))/

str.scan( quote1_re ).flatten.compact
==>[“Here”, “I”, “have”, “two ‘‘quoted’’ matches”, “and”,
“here’'s the second”, “.”]

Great !

Definitely a lot slower.

Speed is not an issue (yet).

You could probably reduce it further, but it’s enough to get you started.

You bet !
Thanks a lot _why …