RegExp outermost ()

Chris_Morris · 22 July 2003 18:39

This may be a case where RegExp ain’t the way to go, but I want to scan
a string with nested paren groups and extract each outermost group. Is
this best done in an RegExp?

s = ‘a group ( stuff ( other )) (( hey ( you ) (there) ) stuff )’

magic_re = # ?
s.scan(magic_re) { |match| puts match }

output:
( stuff ( other ))
(( hey ( you ) (there) ) stuff )

···

–
Chris
http://clabs.org/blogki

Ben_Giddings · 22 July 2003 18:54

What?? You’re trying to make Lisp readable??

···

On Tue July 22 2003 2:39 pm, Chris Morris wrote:

output:
( stuff ( other ))
(( hey ( you ) (there) ) stuff )

Rudolf_Polzer2 · 22 July 2003 19:37

Scripsit ille »Chris Morris« chrismo@clabs.org:

This may be a case where RegExp ain’t the way to go, but I want to scan
a string with nested paren groups and extract each outermost group. Is
this best done in an RegExp?

Impossible. REs can’t do that, ask computer science people.

BTW, where is that proof available?

Hm… aren’t REs equivalent to finite automata? If yes, the proof looks
easy… count how many different states you need, n, and then look what
happens if I feed it with (n+1) opening parens and then (n+1) closing
braces. Especially look how many different states you need then.

But wait… Ruby REs can check “w consists exactly of a composite number
of ones”: /^(11+)\1+$/. Can finite automata do that?

perlre says it is at least possible using recursive REs:

The following pattern matches a parenthesized group:

 $re = qr{
            \(
            (?:
               (?> [^()]+ )    # Non-parens without backtracking
             >
               (??{ $re })     # Group with matching parens
            )*
            \)
         }x;

No idea if Ruby has a similar feature. BTW, where is (?:something)
documented? It just works like in Perl, but where does it stand?

···

–
Nochn Hinweis: Ein Fragezeichen pro Satz reicht, mehr wirkt leicht
albern. Und davor bitte kein Leerzeichen machen. Hab ich “Warum” gehört
? Hier hast du die Antwort.
[Volker Gringmuth in de.newusers.questions]

Warren_Brown3 · 22 July 2003 21:14

Chris,

This may be a case where RegExp ain’t the way
to go, but I want to scan a string with nested
paren groups and extract each outermost group.
Is this best done in an RegExp?

No.  In fact, this is a well-known limitation of regular expressions -

they can’t handle infinitely recursive patterns (without special recursive
extensions like Perl has recently added). Note that if you want to limit
the nesting of parenthesis to one or two levels, you could do it with
regular expressions, but they quickly get ugly as you add levels.

If all you want is to find the matching parenthesis, you could use a

function like:

def find_matching_paren(str,startindex = 0)
level = 0
(startindex…str.length).each do |i|
if str[i,1] == ‘(’ then level += 1 end
if str[i,1] == ‘)’
level -= 1
return i if level == 0
raise “Too many closing parentheses at #{i}.” if level < 0
end
end
nil
end

irb(main):002:0> find_matching_paren(‘abc(d(e(f)g(h)i)j)klm’)
=> 17
irb(main):003:0> find_matching_paren(‘abc(d(e(f)g(h)i)j)klm’,4)
=> 15

If you want to do more sophisticated parsing, you could split the string

on the parenthesis, then construct a structured array of the results:

def parse_parens(str)
raise “Mismatched parentheses” unless str.count(‘(’) == str.count(‘)’)
parts = str.split(/([()])/)
retval = parse_parens_sub(parts)
raise “Improperly nested parentheses” if parts.length > 0
retval
end

def parse_parens_sub(parts)
retval =
while val = parts.shift
next if val == ‘’
return retval if val == ‘)’
retval << if val == ‘(’ then parse_parens_sub(parts) else val end
end
retval
end

irb(main):004:0> parse_parens(‘abc(d(e(f)g(h)i)j)klm’)
=> [“abc”, [“d”, [“e”, [“f”], “g”, [“h”], “i”], “j”], “klm”]

I hope this helps!

- Warren Brown

alex_f · 22 July 2003 22:58

Chris Morris wrote:

This may be a case where RegExp ain’t the way to go, but I want to scan
a string with nested paren groups and extract each outermost group. Is
this best done in an RegExp?

Hi

No, for the reasons previous posters have pointed out. You might be
interested in the Nested Paren Reader library on the RAA.

http://raa.ruby-lang.org/list.rhtml?name=npreader

I didn’t write it, but it’s worked for me in the past.

cheers
alex

···

__

Zachary_P_Landau · 23 July 2003 19:47

s = ‘a group ( stuff ( other )) (( hey ( you ) (there) ) stuff )’

magic_re = # ?
s.scan(magic_re) { |match| puts match }

output:
( stuff ( other ))
(( hey ( you ) (there) ) stuff )

I think the easiest way to do this is with a stack. You push when you
see a ( and pop when you see a ). The first entry on the stack is the
start of a parens. Keep pushing and popping (kinky) until the stack is
empty again. That would be the end of one group. Rinse and repeat.

Kurt_M_Dresner · 22 July 2003 19:45

Impossible. REs can’t do that, ask computer science people.

Yes, but RE can’t match a^iba^i either, but Ruby REs can.

Still might be impossible though. :o)

-Kurt “computer science person”

Chris_Morris · 23 July 2003 12:54

Warren Brown wrote:

If all you want is to find the matching parenthesis, you could use a
function like:

Thx - that’s more or less what I ended up doing.

raise “Mismatched parentheses” unless str.count(‘(’) == str.count(‘)’)

… and this is a handy tidbit I didn’t think of. Thx.

···

–

Chris
http://clabs.org/blogki

Michael_Campbell1 · 22 July 2003 20:07

Impossible. REs can’t do that, ask computer science people.

Yes, but RE can’t match a^iba^i either, but Ruby REs can.

Ruby’s RE’s aren’t R though. Neither are perls.

Kurt_M_Dresner · 22 July 2003 20:15

My point exactly.

···

On Wed, Jul 23, 2003 at 05:07:24AM +0900, Michael Campbell wrote:

Impossible. REs can’t do that, ask computer science people.

Yes, but RE can’t match a^iba^i either, but Ruby REs can.

Ruby’s RE’s aren’t R though. Neither are perls.

======= End of Original Message =======<

Topic		Replies	Views
Recursive brace matching with Ruby regexp ruby-talk	18	116	6 November 2004
Recursive regular expressions in Ruby? ruby-talk	4	139	1 February 2011
Regexp, matching only the content within parentheses ruby-talk	11	152	26 March 2011
Regexp question - look for parentheses then remove them ruby-talk	11	141	9 October 2007
Match/scan does not return multiple matches ruby-talk	11	165	9 February 2010

RegExp outermost ()

Related topics