this one never reads more than pagesize into memory and deals with the fact that a needle could straddle two pages by keeping the current page plus the previous page's first bit (only need maximum of needle size bytes) as the search target.
the extra code is just showing you that it does, in fact, find it's target.
you can up the percent for speed or crank it down to save on memory.
cfp:~ > cat a.rb
def tail_search io, needle, options = {}
io = open io unless io.respond_to?(:read)
percent = Float(options['percent']||options[:percent]||0.10)
buf = ''
size = io.stat.size
pagesize = Integer(size * percent)
pos = 0
loop do
pos -= pagesize
break if pos.abs > size
io.seek(pos, IO::SEEK_END)
buf = io.read(pagesize) + buf[0, needle.size]
relative_index = buf.index(needle)
if relative_index
absolute_index = size + pos + relative_index
return absolute_index
end
end
return nil
ensure
io.close rescue nil
end
needle = 'key=val'
index = tail_search(__FILE__, needle, :percent => 0.02)
if index
open(__FILE__) do |fd|
fd.seek index
puts fd.read(needle.size)
end
end
needle = 'io.close rescue nil'
index = tail_search(__FILE__, needle, :percent => 0.02)
if index
open(__FILE__) do |fd|
fd.seek index
puts fd.read(needle.size)
end
end
__END__
key=val
cfp:~ > ruby a.rb
key=val
io.close rescue nil
a @ http://codeforpeople.com/
···
On Nov 6, 2008, at 10:18 AM, bwv549 wrote:
Here's roughly what I'm using right now (sort of an amalgamation of
everyone's thoughts):
File.open(filename) do |handle|
halfway = handle.stat.size / 2
handle.seek halfway
last_half = handle.read
params_start_index = last_half.rindex(my_substring) + halfway
end
using rindex this takes .17 sec on an 85MB file
using index it takes .45 sec on the same file
Those numbers argue that rindex indeed seeks from the end.
Thanks for everyone's help and I will check back to see if there are
any more thoughts.
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama