RegExp outermost ()

This may be a case where RegExp ain’t the way to go, but I want to scan
a string with nested paren groups and extract each outermost group. Is
this best done in an RegExp?

s = ‘a group ( stuff ( other )) (( hey ( you ) (there) ) stuff )’

magic_re = # ?
s.scan(magic_re) { |match| puts match }

output:
( stuff ( other ))
(( hey ( you ) (there) ) stuff )

···


Chris
http://clabs.org/blogki

What?? You’re trying to make Lisp readable??

:wink:

···

On Tue July 22 2003 2:39 pm, Chris Morris wrote:

output:
( stuff ( other ))
(( hey ( you ) (there) ) stuff )

Scripsit ille »Chris Morris« chrismo@clabs.org:

This may be a case where RegExp ain’t the way to go, but I want to scan
a string with nested paren groups and extract each outermost group. Is
this best done in an RegExp?

Impossible. REs can’t do that, ask computer science people.

BTW, where is that proof available?

Hm… aren’t REs equivalent to finite automata? If yes, the proof looks
easy… count how many different states you need, n, and then look what
happens if I feed it with (n+1) opening parens and then (n+1) closing
braces. Especially look how many different states you need then.

But wait… Ruby REs can check “w consists exactly of a composite number
of ones”: /^(11+)\1+$/. Can finite automata do that?

perlre says it is at least possible using recursive REs:

The following pattern matches a parenthesized group:

 $re = qr{
            \(
            (?:
               (?> [^()]+ )    # Non-parens without backtracking
             >
               (??{ $re })     # Group with matching parens
            )*
            \)
         }x;

No idea if Ruby has a similar feature. BTW, where is (?:something)
documented? It just works like in Perl, but where does it stand?

···


Nochn Hinweis: Ein Fragezeichen pro Satz reicht, mehr wirkt leicht
albern. Und davor bitte kein Leerzeichen machen. Hab ich “Warum” gehört
? Hier hast du die Antwort.
[Volker Gringmuth in de.newusers.questions]

Chris,

This may be a case where RegExp ain’t the way
to go, but I want to scan a string with nested
paren groups and extract each outermost group.
Is this best done in an RegExp?

No.  In fact, this is a well-known limitation of regular expressions -

they can’t handle infinitely recursive patterns (without special recursive
extensions like Perl has recently added). Note that if you want to limit
the nesting of parenthesis to one or two levels, you could do it with
regular expressions, but they quickly get ugly as you add levels.

If all you want is to find the matching parenthesis, you could use a

function like:

def find_matching_paren(str,startindex = 0)
level = 0
(startindex…str.length).each do |i|
if str[i,1] == ‘(’ then level += 1 end
if str[i,1] == ‘)’
level -= 1
return i if level == 0
raise “Too many closing parentheses at #{i}.” if level < 0
end
end
nil
end

irb(main):002:0> find_matching_paren(‘abc(d(e(f)g(h)i)j)klm’)
=> 17
irb(main):003:0> find_matching_paren(‘abc(d(e(f)g(h)i)j)klm’,4)
=> 15

If you want to do more sophisticated parsing, you could split the string

on the parenthesis, then construct a structured array of the results:

def parse_parens(str)
raise “Mismatched parentheses” unless str.count(‘(’) == str.count(‘)’)
parts = str.split(/([()])/)
retval = parse_parens_sub(parts)
raise “Improperly nested parentheses” if parts.length > 0
retval
end

def parse_parens_sub(parts)
retval =
while val = parts.shift
next if val == ‘’
return retval if val == ‘)’
retval << if val == ‘(’ then parse_parens_sub(parts) else val end
end
retval
end

irb(main):004:0> parse_parens(‘abc(d(e(f)g(h)i)j)klm’)
=> [“abc”, [“d”, [“e”, [“f”], “g”, [“h”], “i”], “j”], “klm”]

I hope this helps!

- Warren Brown

Chris Morris wrote:

This may be a case where RegExp ain’t the way to go, but I want to scan
a string with nested paren groups and extract each outermost group. Is
this best done in an RegExp?

Hi

No, for the reasons previous posters have pointed out. You might be
interested in the Nested Paren Reader library on the RAA.

http://raa.ruby-lang.org/list.rhtml?name=npreader

I didn’t write it, but it’s worked for me in the past.

cheers
alex

···

__

s = ‘a group ( stuff ( other )) (( hey ( you ) (there) ) stuff )’

magic_re = # ?
s.scan(magic_re) { |match| puts match }

output:
( stuff ( other ))
(( hey ( you ) (there) ) stuff )

I think the easiest way to do this is with a stack. You push when you
see a ( and pop when you see a ). The first entry on the stack is the
start of a parens. Keep pushing and popping (kinky) until the stack is
empty again. That would be the end of one group. Rinse and repeat.

Impossible. REs can’t do that, ask computer science people.

Yes, but RE can’t match a^iba^i either, but Ruby REs can.

Still might be impossible though. :o)

-Kurt “computer science person”

Warren Brown wrote:

If all you want is to find the matching parenthesis, you could use a
function like:

Thx - that’s more or less what I ended up doing.

raise “Mismatched parentheses” unless str.count(‘(’) == str.count(‘)’)

… and this is a handy tidbit I didn’t think of. Thx.

···

Chris
http://clabs.org/blogki

Impossible. REs can’t do that, ask computer science people.

Yes, but RE can’t match a^iba^i either, but Ruby REs can.

Ruby’s RE’s aren’t R though. Neither are perls.

My point exactly.

···

On Wed, Jul 23, 2003 at 05:07:24AM +0900, Michael Campbell wrote:

Impossible. REs can’t do that, ask computer science people.

Yes, but RE can’t match a^iba^i either, but Ruby REs can.

Ruby’s RE’s aren’t R though. Neither are perls.

======= End of Original Message =======<