Regex help

Shashank_Date3 · 14 November 2002 06:39

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /’.*?’/

But then I also want it to match single-quoted strings which have embedded
single quotes.
To embed, you can escape a single quote by putting another single quote just
before it.
For example :

m = /’.*?’/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one
match})
puts m[0] # ^ ^
# These are two single quotes and not a
double quote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

Please help.
TIA

– shanko

Austin_Ziegler2 · 14 November 2002 06:59

You’re doing a minimal match. Use:

/‘.‘/ instead of /’.?’/

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.11.14 at 01.58.50

···

On Thu, 14 Nov 2002 15:39:09 +0900, Shashank Date wrote:

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

why_the_lucky_stiff1 · 14 November 2002 07:00

quote1_re = /‘((?:’‘|[^’])+)'/

quote1_re.match( “Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’.” ).to_a[1]
==>[“two ‘‘quoted’’ matches”]

“Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’.”.scan( quote1_re )
==>[[“two ‘‘quoted’’ matches”], [“here’'s the second”]]

_why

···

Shashank Date (sdate@kc.rr.com) wrote:

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

But then I also want it to match single-quoted strings which have embedded
single quotes.
m = /'.*?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one

Idan_Sofer · 14 November 2002 07:04

Shashank Date wrote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

Just remove the question mark from the regexp, and it will work(at least
for me), the question mark makes the .* non greedy…

Idan.

Shashank_Date3 · 14 November 2002 07:19

Answering my own post … should have tried harder b4 posting.

m = /'.''.?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with
‘two’ matches})
puts m[0]

Sorry for the noise !

“Shashank Date” sdate@kc.rr.com wrote in message
news:qmHA9.17372$jj5.440511@twister.rdc-kc.rr.com…

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

But then I also want it to match single-quoted strings which have embedded
single quotes.
To embed, you can escape a single quote by putting another single quote
just
before it.
For example :

m = /'.*?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one
match})
puts m[0] # ^ ^
# These are two single quotes and not
a

···

double quote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

Please help.
TIA

– shanko

Alan_Chen2 · 14 November 2002 17:49

Since this type of regex question comes up regularly, I wonder how difficult
it would be to provide a regex extension which matches pairs of quoted characters;
either identical chars like ’ or " or character pairs () or {}, etc…

···

On Thu, Nov 14, 2002 at 03:39:09PM +0900, Shashank Date wrote:

Using ruby 1.7.3 (2002-10-12) [i386-mswin32] on Win XP (Home)

Trying to creat a regex which will match a single-quoted string.
My first attempt ‘almost’ works: /'.*?'/

But then I also want it to match single-quoted strings which have embedded
single quotes.
To embed, you can escape a single quote by putting another single quote just
before it.
For example :

m = /'.*?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with only one
match})
puts m[0] # ^ ^
# These are two single quotes and not a
double quote

I want this to print: ‘single ‘‘quoted’’ string’ .
But I am getting: 'single ’

–
Alan Chen
Digikata LLC
http://digikata.com

Shashank_Date3 · 14 November 2002 07:19

Like this solution the most ! Thanks _why !!

“why the lucky stiff” ruby-talk@whytheluckystiff.net wrote in message
news:20021114071035.GA4625@rysa.inetz.com…

"Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’ ".scan(
quote1_re )
==>[ [“two ‘‘quoted’’ matches”], [“here’'s the second”]]

Now I would like to modify it to generate the following:

“Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’ one.”.scan(
???)
==> [[“Here”],[“I”],[“have”],[“two ‘‘quoted’’ matches”], [“and”],[“here’'s
the second”],[“one.”]]

Shashank_Date3 · 14 November 2002 07:19

“Shashank Date” sdate@kc.rr.com wrote in message
news:PTHA9.17397$jj5.446958@twister.rdc-kc.rr.com…

Answering my own post … should have tried harder b4 posting.

Again !

m = /'.''.?'/.match(%Q{This is a ‘single ‘‘quoted’’ string’ with
‘two’ matches})
puts m[0]

This solution is flawed (time to take some sleep) … please see solution by
“why the lucky stiff”

Ara.T.Howard1 · 14 November 2002 17:20

[snip]

quote1_re = /((?:‘’|[^'])+)/
[snip]

TESTING REGEX ((?:‘’|[^'])+)

    '        did not match        -
    ''       matched           ('')
    '''      matched           ('')
    ''''     matched         ('''')
    'a       matched            (a)
    a'       matched            (a)
    ''a      matched          (''a)
    a''      matched          (a'')
    '''a     matched           ('')
    a'''     matched          (a'')
    ''''a    matched        (''''a)
    a''''    matched        (a'''')

TESTING REGEX (?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

    '        did not match        -
    ''       matched           ('')
    '''      did not match        -
    ''''     matched         ('''')
    'a       did not match        -
    a'       did not match        -
    ''a      matched           ('')
    a''      matched           ('')
    '''a     did not match        -
    a'''     did not match        -
    ''''a    matched         ('''')
    a''''    matched         ('''')

THE PROGRAM

#!/usr/bin/env ruby

regexs = [
%r{((?:‘’|[^‘])+)},
%r{(?:^|[^’])(‘(?:[^’]|‘’)*‘)(?:[^’]|$)}
]
strings = %w(
’ ‘’ ‘’’ ‘’‘’ ‘a a’
‘‘a a’’ ‘’‘a a’‘’ ‘’‘‘a a’’‘’
)
regexs.each do |r|
puts “\nTESTING REGEX #{r.source}\n\n”
strings.each do |s|
m = r.match s
format = “\t%-8.8s %-13.13s %8.8s\n”
if m
printf format, s, ‘matched’, “(#{m[1]})”
else
printf format, s, ‘did not match’, ‘-’
end
end
end

(?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

beginning with the start of string or not a ’
followed by a ’
followed by zero or more of not ’ or ‘’
followed by a ’
ending with the end of string or not a ’
the quoted string is captured, nothing else is

-ara

···

On Thu, 14 Nov 2002, why the lucky stiff wrote:

–

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Michael_Campbell1 · 14 November 2002 18:02

Since this type of regex question comes up regularly, I wonder how
difficult
it would be to provide a regex extension which matches pairs of
quoted characters;
either identical chars like ’ or " or character pairs () or {},
etc…

Those aren’t (mathematically) “regular” then, are they? (Granted,
existing ‘regular’ expressions aren’t either with backreferencing.)

···

=====

Yahoo IM: michael_s_campbell

Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

why_the_lucky_stiff1 · 14 November 2002 15:43

A simpler regexp to gather characters around the quoted string:

quote1_re = /((?:‘’|[^'])+)/
str = “Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’.”

str.scan( quote1_re )
==>[[“Here I have “], [“two ‘‘quoted’’ matches”], [” and “],
[“here’'s the second”], [”.”]]

For the outcome you’d like:

quote1_re = /(?:‘((?:’‘|[^’])+)'|(\S+))/

str.scan( quote1_re ).flatten.compact
==>[“Here”, “I”, “have”, “two ‘‘quoted’’ matches”, “and”,
“here’'s the second”, “.”]

Definitely a lot slower. You could probably reduce it further, but it’s
enough to get you started. Goo luck.

_why

···

Shashank Date (sdate@kc.rr.com) wrote:

Now I would like to modify it to generate the following:

“Here I have ‘two ‘‘quoted’’ matches’ and ‘here’‘s the second’ one.”.scan(
???)
==> [[“Here”],[“I”],[“have”],[“two ‘‘quoted’’ matches”], [“and”],[“here’'s
the second”],[“one.”]]

Ara.T.Howard1 · 14 November 2002 20:20

in fact they are regular because the matching char is always known : it is
either the same, or a match to a known mate. check out the oreily regex book,
it’s a really good read and has many examples of using dfa and nfa regexes to
match, for example, matching comment paris (/* */) which is exactly like this
problem

now, it would be non-regular if you tried to make a single regular expression
which would match ALL occurances mentioned since backreferencing would be
required - maybe that’s what you mean? i do agree that this type of method
should not belong in a Regex class since a general method would appear to be
non-regular, even if the collection of regexs where individually regular

i think a String#delimscan method would be good, something like :

class String
@@delimscan_chars =
{
‘'’ => ‘'’,
‘"’ => ‘"’,
‘(’ => ‘)’,
‘[’ => ‘]’,
‘{’ => ‘}’,
‘<’ => ‘>’,
}
def delimscan delim=‘"’, escape=‘\’
o = delim[0]
c = @@delimscan_chars[delim]
c = c ? c[0] : o
e = escape[0]

w = []
b = 0
a = 0
n = 0

while ((b = self[n])) do
  case b
when e
  n += 2 and next
when o
  a = n
  n += 1
  while ((b = self[n])) do
    case b
      when e
	n += 2 and next
      when c
	w << self[a..n]
	break
    end
    n += 1
  end
else;
  end
  n += 1
end

return w

end
end

if $0 == FILE
strings = [
%q(“one”),
%q(“one” “two”),
%q(“o"ne” “t"wo”),
%q(“),
%q(”“),
%q(”“”),
%q(),
].each do |s|
puts “TESTING #{s}”
puts s.delimscan.inspect
end

strings = [
%q(),
%q(<o! <t!>wo>),
%q(<),
%q(<>),
%q(<>>),
%q(),
].each do |s|
puts “TESTING #{s}”
puts s.delimscan(‘<’, ‘!’).inspect
end
end

-a

···

On Fri, 15 Nov 2002, Michael Campbell wrote:

Those aren’t (mathematically) “regular” then, are they? (Granted,
existing ‘regular’ expressions aren’t either with backreferencing.)

–

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Shashank_Date3 · 15 November 2002 03:22

Nice !

Learnt a lot from your examples …
Thanks !

“ahoward” ahoward@fsl.noaa.gov wrote in message

···

On Thu, 14 Nov 2002, why the lucky stiff wrote:

[snip]

quote1_re = /((?:‘’|[^'])+)/
[snip]

TESTING REGEX ((?:‘’|[^'])+)
    '        did not match        -
    ''       matched           ('')
    '''      matched           ('')
    ''''     matched         ('''')
    'a       matched            (a)
    a'       matched            (a)
    ''a      matched          (''a)
    a''      matched          (a'')
    '''a     matched           ('')
    a'''     matched          (a'')
    ''''a    matched        (''''a)
    a''''    matched        (a'''')
TESTING REGEX (?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)
    '        did not match        -
    ''       matched           ('')
    '''      did not match        -
    ''''     matched         ('''')
    'a       did not match        -
    a'       did not match        -
    ''a      matched           ('')
    a''      matched           ('')
    '''a     did not match        -
    a'''     did not match        -
    ''''a    matched         ('''')
    a''''    matched         ('''')
THE PROGRAM

#!/usr/bin/env ruby

regexs = [
%r{((?:‘’|[^‘])+)},
%r{(?:^|[^’])(‘(?:[^’]|‘’)*‘)(?:[^’]|$)}
]
strings = %w(
’ ‘’ ‘’’ ‘’‘’ ‘a a’
‘‘a a’’ ‘’‘a a’‘’ ‘’‘‘a a’’‘’
)
regexs.each do |r|
puts “\nTESTING REGEX #{r.source}\n\n”
strings.each do |s|
m = r.match s
format = “\t%-8.8s %-13.13s %8.8s\n”
if m
printf format, s, ‘matched’, “(#{m[1]})”
else
printf format, s, ‘did not match’, ‘-’
end
end
end

(?:^|[^‘])(’(?:[^‘]|’‘)*’)(?:[^']|$)

beginning with the start of string or not a ’

followed by a ’

followed by zero or more of not ’ or ‘’

followed by a ’

ending with the end of string or not a ’

the quoted string is captured, nothing else is

-ara

Shashank_Date3 · 15 November 2002 03:22

“why the lucky stiff” ruby-talk@whytheluckystiff.net wrote in message
news:20021114155331.GA7742@rysa.inetz.com…

For the outcome you’d like:

quote1_re = /(?:‘((?:’‘|[^’])+)'|(\S+))/

str.scan( quote1_re ).flatten.compact
==>[“Here”, “I”, “have”, “two ‘‘quoted’’ matches”, “and”,
“here’'s the second”, “.”]

Great !

Definitely a lot slower.

Speed is not an issue (yet).

You could probably reduce it further, but it’s enough to get you started.

You bet !
Thanks a lot _why …

Topic		Replies	Views
RegExp Problem ruby-talk	17	94	25 May 2006
Newbie Question: Escaping special characters in array of strings ruby-talk	5	113	22 November 2004
How to double single quote (beginner blockage) ruby-talk	7	125	18 January 2006
Escaping single quotes in a string with gsub ruby-talk	5	1590	3 November 2004
RegEx help (small) ruby-talk	4	69	27 January 2005

Regex help

=====

Related topics