Regexp question - look for parentheses then remove them

I'm struggling with a regular expression problem, can anyone help?

I want to take a string, look for anything in parentheses, and if i find
anything, put it into an array, minus the parentheses.

currently i'm doing this:

parentheses = /\(.*\)/
array = string.scan(parentheses)

This gives me eg

"3 * (1 + 2)" => ["(1 + 2)"]

- but is there an easy way to strip the parentheses off before putting
it into the array?

eg
"3 * (1 + 2)" => ["1 + 2"]

In addition, if i have nested parentheses inside the outer parentheses,
i want to keep them, eg

"3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]

can anyone show me how to do this?

thanks
max

···

--
Posted via http://www.ruby-forum.com/.

x = "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=> ["1 + 2"]
x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=> ["2 + (1 + 3)"]

Hope this helps,

Jesus.

···

On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:

I'm struggling with a regular expression problem, can anyone help?

I want to take a string, look for anything in parentheses, and if i find
anything, put it into an array, minus the parentheses.

currently i'm doing this:

parentheses = /\(.*\)/
array = string.scan(parentheses)

This gives me eg

"3 * (1 + 2)" => ["(1 + 2)"]

- but is there an easy way to strip the parentheses off before putting
it into the array?

eg
"3 * (1 + 2)" => ["1 + 2"]

In addition, if i have nested parentheses inside the outer parentheses,
i want to keep them, eg

"3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]

can anyone show me how to do this?

Jesús Gabriel y Galán wrote:

···

On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:

This gives me eg
i want to keep them, eg

"3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]

can anyone show me how to do this?

x = "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=> ["1 + 2"]
x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=> ["2 + (1 + 3)"]

Hope this helps,

Jesus.

ah, "captures" - that's the same as MatchData#to_a, right? Perfect,
thanks!
--
Posted via http://www.ruby-forum.com/\.

That can fail if you have more than one bracket pair on the lowest level:

irb(main):002:0> "3 * (2 + (1 + 3)) + (1 * 4)".match(/\((.*)\)/).to_a
=> ["(2 + (1 + 3)) + (1 * 4)", "2 + (1 + 3)) + (1 * 4"]

I'm not sure you can match more complex examples using a regular expression
- you may be able to pull something off with lookaheads, but I think it'd be
easier to just parse the string manually and count opened brackets:

$ cat para.rb
def para(str)
  open = 0
  matches =
  current = ""
  str.split(/\s*/).each do |char|
    if char == ")"
      open -= 1
      if open == 0
        matches << current
        current = ""
      else
        current << char
      end
    elsif char == "("
      open += 1
      if open > 1
        current << char
      end
    elsif open > 0
      current << char
    end
  end
  matches
end

$ irb
irb(main):001:0> require 'para'
=> true
irb(main):002:0> para("1+2")
=>
irb(main):003:0> para("(1+2)")
=> ["1+2"]
irb(main):004:0> para("(1+2)*3")
=> ["1+2"]
irb(main):005:0> para("((1+2)*3)")
=> ["(1+2)*3"]
irb(main):006:0> para("((1+2)*3)+(5*6)")
=> ["(1+2)*3", "5*6"]
irb(main):007:0> para("((1+2)*3)+(5*6*(1-3*(1-4)))")
=> ["(1+2)*3", "5*6*(1-3*(1-4))"]
irb(main):008:0>

There are probably far more elegant ways.

HTH,

Felix

···

-----Original Message-----
From: Jesús Gabriel y Galán [mailto:jgabrielygalan@gmail.com]
Sent: Monday, October 08, 2007 5:25 AM
To: ruby-talk ML
Subject: Re: regexp question - look for parentheses then remove them

On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:
> I'm struggling with a regular expression problem, can anyone help?
>
> I want to take a string, look for anything in parentheses,
and if i find
> anything, put it into an array, minus the parentheses.
>
> currently i'm doing this:
>
> parentheses = /\(.*\)/
> array = string.scan(parentheses)
>
> This gives me eg
>
> "3 * (1 + 2)" => ["(1 + 2)"]
>
> - but is there an easy way to strip the parentheses off
before putting
> it into the array?
>
> eg
> "3 * (1 + 2)" => ["1 + 2"]
>
> In addition, if i have nested parentheses inside the outer
parentheses,
> i want to keep them, eg
>
> "3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]
>
> can anyone show me how to do this?

x = "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=> ["1 + 2"]
x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=> ["2 + (1 + 3)"]

Hope this helps,

Jesus.

Not exactly, because the MatchData#to_a returns as the first position
of the array the string that matched, and then starting from x[1] the
captured groups. MatchData#captures only contains the captures. See
the difference:

irb(main):001:0> a = "123456".match(/(.)(.)\d\d/)
=> #<MatchData:0xb7c97a04>
irb(main):002:0> a.to_a
=> ["1234", "1", "2"]
irb(main):003:0> a.captures
=> ["1", "2"]

Jesus.

···

On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:

Jesús Gabriel y Galán wrote:
> On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:
>> This gives me eg
>> i want to keep them, eg
>>
>> "3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]
>>
>> can anyone show me how to do this?
>
> x = "3 * (1 + 2)".match(/\((.*)\)/)
> x.captures
> => ["1 + 2"]
> x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
> x.captures
> => ["2 + (1 + 3)"]
>
> Hope this helps,
>
> Jesus.

ah, "captures" - that's the same as MatchData#to_a, right? Perfect,

True, what would be the expected result for this?

["2 + (1 + 3)", "1 * 4"] ???

I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.

Cheers,

Jesus.

···

On 10/8/07, Felix Windt <fwmailinglists@gmail.com> wrote:

> From: Jesús Gabriel y Galán [mailto:jgabrielygalan@gmail.com]
> On 10/8/07, Max Williams <toastkid.williams@gmail.com> wrote:
> > "3 * (1 + 2)" => ["1 + 2"]
> > "3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]
> >
> > can anyone show me how to do this?
>
> x = "3 * (1 + 2)".match(/\((.*)\)/)
> x.captures
> => ["1 + 2"]
> x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
> x.captures
> => ["2 + (1 + 3)"]

That can fail if you have more than one bracket pair on the lowest level:

irb(main):002:0> "3 * (2 + (1 + 3)) + (1 * 4)".match(/\((.*)\)/).to_a
=> ["(2 + (1 + 3)) + (1 * 4)", "2 + (1 + 3)) + (1 * 4"]

ah, "captures"

You can access the match data right away:

x = /\((.*)\)/.match("3 * (1 + 2)")
x[1]
or $1

I'd also make the * non-greedy -> *?

/\((.*?)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2"

but:
/\((.*)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2) * (3 + 4"

[snip]

??

I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.

Cheers,

Jesus.

Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can't be done with a regex. (Well,
assuming you actually care about the nested parens.)

tho_mica_l wrote:

I'd also make the * non-greedy -> *?

/\((.*?)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2"

but:
/\((.*)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2) * (3 + 4"

Excellent tip, cheers!

···

--
Posted via http://www.ruby-forum.com/\.

I've read that the .NET regex engine has some constructs to recognize
balanced constructs like parens:

http://puzzleware.net/blogs/archive/2005/08/13/22.aspx

Interesting !!

Jesus.

···

On 10/9/07, Michael Bevilacqua-Linn <michael.bevilacqualinn@gmail.com> wrote:

[snip]

??
>
> I agree that for complex cases a regexp is not the solution. A
> solution like yours counting parens (or with a stack) should be
> preferred way.

Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can't be done with a regex. (Well,
assuming you actually care about the nested parens.)

Jesús Gabriel y Galán wrote:

I've read that the .NET regex engine has some constructs to recognize
balanced constructs like parens...

It's possible in Ruby 1.9 or Ruby 1.8 and the Oniguruma library too:

module Matchelements
  def bal(lpar='(', rpar=')')
    raise RegexpError,
      "wrong length of left bracket '#{lpar}' in bal" unless lpar.length
== 1
    raise RegexpError,
      "wrong length of right bracket '#{rpar}' in bal" unless
rpar.length == 1
    raise RegexpError,
      "identical left and right bracket '#{lpar}' in bal" if
lpar.eql?(rpar)
    lclass, rclass = lpar, rpar
    lclass = '\\' + lclass if lclass.match(/[\-\[\]]/)
    rclass = '\\' + rclass if rclass.match(/[\-\[\]]/)
    return "(?<bal>" +
              "[^#{lclass}#{rclass}]*?" +
              "(?:\\#{lpar}\\g<bal>\\#{rpar}" +
                "[^#{lclass}#{rclass}]*?" +
              ")*?" +
           ")"
  end
end
include Matchelements

result = "3 * (2 + (1 + 3)) + (1 * 4)".scan(/\(#{bal()}\)/)

p result # => [["2 + (1 + 3)"], ["1 * 4"]]

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

That's neat! (Or a sacrilege, I guess, depending on how you look at it :-))

MBL

···

On 10/9/07, Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

I've read that the .NET regex engine has some constructs to recognize
balanced constructs like parens:

http://puzzleware.net/blogs/archive/2005/08/13/22.aspx

Interesting !!

Jesus.