Regexp/scan question

Hello,

I need to match a chunk of code like this:

....
#begin here
...}
......end
...}
......}
.....end
...

I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.

Example
match_stuff(2):

#begin here
...}
......end

match_stuff(4):

#begin here
...}
......end
...}
......}

etc.

What's the most optimal way to accomplish this? I have been trying with scan() but I did not really succeed yet

TIA,
Peter

···

__
http://www.rubyrailways.com

Peter Szinek wrote:

Hello,

I need to match a chunk of code like this:

....
#begin here
...}
......end
...}
......}
.....end
...

I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

?

(not tested).

Peter Szinek wrote:

Hello,

I need to match a chunk of code like this:

...
...
#begin here
..}
.....end
..}
.....}
....end

This won't solve the entire problem, but it will give you an array of
indices to matching elements:

···

---------------------------------

#!/usr/bin/ruby -w

data = File.read("testdata.txt")

match_indices =

data.scan(/\}/) do
  match_indices << Regexp.last_match.begin(0)
end

puts match_indices

---------------------------------

You could begin by scanning to your planned start mark, then scan for
matching elements using this code. Or you could segregate the block between
the start and end marks, then scan for matches in the substring using this
code.

--
Paul Lutus
http://www.arachnoid.com

Carlos wrote:

Peter Szinek wrote:

Hello,

I need to match a chunk of code like this:

....
#begin here
...}
......end
...}
......}
.....end
...

I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*?(\}|end)){#{n}}/m

                     ^
better with '?', to make it not greedy :).

Carlos wrote:

Peter Szinek wrote:

Hello,

I need to match a chunk of code like this:

....
#begin here
...}
......end
...}
......}
.....end
...

I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

Sorry, I need to 'scan' it. I have been playing around with similar regexps, but they did not work out. E.g. also yours:

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

does not work with scan...

Cheers,
Peter

···

__
http://www.rubyrailways.com

IMHO this does not work because of the greedy ".*". You could try with reluctant, i.e. ".*?". Also the grouping does not catch the whole sequence.

  robert

···

On 11.12.2006 10:37, Carlos wrote:

Peter Szinek wrote:

Hello,

I need to match a chunk of code like this:

....
#begin here
...}
......end
...}
......}
.....end
...

I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

?

(not tested).

IMHO this does not work because of the greedy ".*". You could try with reluctant, i.e. ".*?". Also the grouping does not catch the whole sequence.

Yeah, I tried to correct these problems but I am still not quite there...

Carlos' regexp, vol 2 (with greedy ?)

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*?(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

And I would like to get

[["#begin aaaa end bbb }"]]

OK, I know that I did not specify the problem correctly for the first time, maybe now it is more clear...

Cheers,
Peter

···

__
http://www.rubyrailways.com

Peter Szinek wrote:

Carlos wrote:

Peter Szinek wrote:

Hello,

I need to match a chunk of code like this:

....
#begin here
...}
......end
...}
......}
.....end
...

I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

Sorry, I need to 'scan' it. I have been playing around with similar regexps, but they did not work out. E.g. also yours:

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

does not work with scan...

To make it work with scan just make the parens non-capturing:

irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]

Good luck.

···

--

To make it work with scan just make the parens non-capturing:

irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]

Ha! That was the trick I have been looking for! Muchas Gracias, Carlos.

Cheers,
Peter

···

__
http://www.rubyrailways.com