I'm not sure you can match more complex examples using a regular expression
- you may be able to pull something off with lookaheads, but I think it'd be
easier to just parse the string manually and count opened brackets:
$ cat para.rb
def para(str)
open = 0
matches =
current = ""
str.split(/\s*/).each do |char|
if char == ")"
open -= 1
if open == 0
matches << current
current = ""
else
current << char
end
elsif char == "("
open += 1
if open > 1
current << char
end
elsif open > 0
current << char
end
end
matches
end
-----Original Message-----
From: Jesús Gabriel y Galán [mailto:jgabrielygalan@gmail.com]
Sent: Monday, October 08, 2007 5:25 AM
To: ruby-talk ML
Subject: Re: regexp question - look for parentheses then remove them
On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:
> I'm struggling with a regular expression problem, can anyone help?
>
> I want to take a string, look for anything in parentheses,
and if i find
> anything, put it into an array, minus the parentheses.
>
> currently i'm doing this:
>
> parentheses = /\(.*\)/
> array = string.scan(parentheses)
>
> This gives me eg
>
> "3 * (1 + 2)" => ["(1 + 2)"]
>
> - but is there an easy way to strip the parentheses off
before putting
> it into the array?
>
> eg
> "3 * (1 + 2)" => ["1 + 2"]
>
> In addition, if i have nested parentheses inside the outer
parentheses,
> i want to keep them, eg
>
> "3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]
>
> can anyone show me how to do this?
Not exactly, because the MatchData#to_a returns as the first position
of the array the string that matched, and then starting from x[1] the
captured groups. MatchData#captures only contains the captures. See
the difference:
I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.
Cheers,
Jesus.
Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can't be done with a regex. (Well,
assuming you actually care about the nested parens.)
On 10/9/07, Michael Bevilacqua-Linn <michael.bevilacqualinn@gmail.com> wrote:
[snip]
??
>
> I agree that for complex cases a regexp is not the solution. A
> solution like yours counting parens (or with a stack) should be
> preferred way.
Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can't be done with a regex. (Well,
assuming you actually care about the nested parens.)
I've read that the .NET regex engine has some constructs to recognize
balanced constructs like parens...
It's possible in Ruby 1.9 or Ruby 1.8 and the Oniguruma library too:
module Matchelements
def bal(lpar='(', rpar=')')
raise RegexpError,
"wrong length of left bracket '#{lpar}' in bal" unless lpar.length
== 1
raise RegexpError,
"wrong length of right bracket '#{rpar}' in bal" unless
rpar.length == 1
raise RegexpError,
"identical left and right bracket '#{lpar}' in bal" if
lpar.eql?(rpar)
lclass, rclass = lpar, rpar
lclass = '\\' + lclass if lclass.match(/[\-\[\]]/)
rclass = '\\' + rclass if rclass.match(/[\-\[\]]/)
return "(?<bal>" +
"[^#{lclass}#{rclass}]*?" +
"(?:\\#{lpar}\\g<bal>\\#{rpar}" +
"[^#{lclass}#{rclass}]*?" +
")*?" +
")"
end
end
include Matchelements