If you have millions of matches then you might consider stream processing,
which allows you to make use of laziness.
An example of doing this might be:
a = <<ETX
doesn't start with http
http...x1x
http.......x2x
http.....yxy
http........x3x
http...x4x
ETX
def nth_match(enumerable, n, pattern)
enumerable
.map { |l| pattern.match(l) }
.reject(&:nil?)
.map { |m|
puts "retrieving #{m[1].inspect}” # just here to show when it is called
m[1]
}
.drop(n - 1)
.first
end
puts "got #{nth_match(a.lines, 3, /http.*?(x.x)/)}"
puts "got #{nth_match(a.lines.lazy, 3, /http.*?(x.x)/)}”
__END__
If you run that you should see that the version with .lazy does not
retrieve x4x.
That code isn't even syntactical correct.
You should try benchmarking various approaches to see what works best for
you in terms of time and memory use for real representative data. The lazy
enumerators are not free in terms of performance, but have the advantage of
working well on long (or infinite) streams.
We do not even know where the data comes from. So far we only have seen a
single String instance as source. (Btw. \n does not work as probably
intended in single quotes.)
For a String a lazy approach to find the nth occurrence of a match would be:
def lazy_match(input, rx, n)
raise ArgumentError, "Invalid n: #{n.inspect}" unless n > 0
input.scan rx do |m|
n -= 1
return m if n == 0
end
nil
end
irb(main):020:0> lazy_match a, /http.*?(x.x)/, 3
=> ["x3x"]
Kind regards
robert
···
On Sun, Jan 3, 2016 at 5:18 PM, Mike Stok <mike@stok.ca> wrote:
--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can -
without end}
http://blog.rubybestpractices.com/