Chris,
This may be a case where RegExp ain’t the way
to go, but I want to scan a string with nested
paren groups and extract each outermost group.
Is this best done in an RegExp?
No. In fact, this is a well-known limitation of regular expressions -
they can’t handle infinitely recursive patterns (without special recursive
extensions like Perl has recently added). Note that if you want to limit
the nesting of parenthesis to one or two levels, you could do it with
regular expressions, but they quickly get ugly as you add levels.
If all you want is to find the matching parenthesis, you could use a
function like:
def find_matching_paren(str,startindex = 0)
level = 0
(startindex…str.length).each do |i|
if str[i,1] == ‘(’ then level += 1 end
if str[i,1] == ‘)’
level -= 1
return i if level == 0
raise “Too many closing parentheses at #{i}.” if level < 0
end
end
nil
end
irb(main):002:0> find_matching_paren(‘abc(d(e(f)g(h)i)j)klm’)
=> 17
irb(main):003:0> find_matching_paren(‘abc(d(e(f)g(h)i)j)klm’,4)
=> 15
If you want to do more sophisticated parsing, you could split the string
on the parenthesis, then construct a structured array of the results:
def parse_parens(str)
raise “Mismatched parentheses” unless str.count(‘(’) == str.count(‘)’)
parts = str.split(/([()])/)
retval = parse_parens_sub(parts)
raise “Improperly nested parentheses” if parts.length > 0
retval
end
def parse_parens_sub(parts)
retval =
while val = parts.shift
next if val == ‘’
return retval if val == ‘)’
retval << if val == ‘(’ then parse_parens_sub(parts) else val end
end
retval
end
irb(main):004:0> parse_parens(‘abc(d(e(f)g(h)i)j)klm’)
=> [“abc”, [“d”, [“e”, [“f”], “g”, [“h”], “i”], “j”], “klm”]
I hope this helps!
- Warren Brown