.... #begin here
...}
......end
...}
......}
.....end
...
I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.
Example
match_stuff(2):
#begin here
...}
......end
match_stuff(4):
#begin here
...}
......end
...}
......}
etc.
What's the most optimal way to accomplish this? I have been trying with scan() but I did not really succeed yet
.... #begin here
...}
......end
...}
......}
.....end
...
I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.
...
... #begin here
..}
.....end
..}
.....}
....end
This won't solve the entire problem, but it will give you an array of
indices to matching elements:
···
---------------------------------
#!/usr/bin/ruby -w
data = File.read("testdata.txt")
match_indices =
data.scan(/\}/) do
match_indices << Regexp.last_match.begin(0)
end
puts match_indices
---------------------------------
You could begin by scanning to your planned start mark, then scan for
matching elements using this code. Or you could segregate the block between
the start and end marks, then scan for matches in the substring using this
code.
.... #begin here
...}
......end
...}
......}
.....end
...
I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.
.... #begin here
...}
......end
...}
......}
.....end
...
I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.
n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m
Sorry, I need to 'scan' it. I have been playing around with similar regexps, but they did not work out. E.g. also yours:
irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]
IMHO this does not work because of the greedy ".*". You could try with reluctant, i.e. ".*?". Also the grouping does not catch the whole sequence.
robert
···
On 11.12.2006 10:37, Carlos wrote:
Peter Szinek wrote:
Hello,
I need to match a chunk of code like this:
.... #begin here
...}
......end
...}
......}
.....end
...
I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.
.... #begin here
...}
......end
...}
......}
.....end
...
I need to match from "the #begin here" up to the n-th closing token (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is meaningful, i.e. there are no more '}' + 'end's than n.
n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m
Sorry, I need to 'scan' it. I have been playing around with similar regexps, but they did not work out. E.g. also yours:
irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]
does not work with scan...
To make it work with scan just make the parens non-capturing:
irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]
To make it work with scan just make the parens non-capturing:
irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]
Ha! That was the trick I have been looking for! Muchas Gracias, Carlos.