If you want to tail beginning at an arbitrary position in the file,
that will work, but many will probably want to specify the # of lines
from the end.
Yes, of course. I was referring to my implementation in file/tail:
def last(n = 0, bufsize = 4096)
if n <= 0
seek(0, File::SEEK_END)
return
end
size = stat.size
begin
if bufsize < size
seek(0, File::SEEK_END)
while n > 0 and tell > 0 do
start = tell
seek(-bufsize, File::SEEK_CUR)
buffer = read(bufsize)
n -= buffer.count("\n")
seek(-bufsize, File::SEEK_CUR)
end
else
seek(0, File::SEEK_SET)
buffer = read(size)
n -= buffer.count("\n")
seek(0, File::SEEK_SET)
end
rescue Errno::EINVAL
size = tell
retry
end
pos = -1
while pos and n < 0 # forward if we are too far back
pos = buffer.index("\n", pos + 1)
n += 1
end
seek(pos + 1, File::SEEK_CUR)
end
I’m using buffer.count(“\n”) to count all the newlines in a buffer. I
didn’t want to reverse the string first, because this would not be very
performant either. So I search forward in the buffer to find
the right newline in the last while-loop, if I am too far back in the
file.
You could seek to the end, then seek backwards in chunks, read in each
chunk, then count backwards through the chunk counting newlines and
keeping track of filepos, and once you hit the # lines you want, seek to
that position and then read from there. This would cut down on the #
of seeks and reads in my method above, probably resulting in much
better performance.
Yes. This is pretty similar to my implementation above. I think one
bottleneck in scripting languages exists if you copy lots of data
between the scripting level and the c-level. To do most of the things on
the c-level and then copy the results back at the end is usually much
faster. That’s (and to spare a lot of method calls) why I used count
instead of buffer. Perhaps I should waste a few rindex calls to
search the buffer backwards because it probably doesn’t make much of
difference in practice.
···
On Fri, 2002-08-02 at 16:46, James F.Hranicky wrote: