A very basic tail -f implementation

From: Paul Brannan [mailto:pbrannan@atdesk.com]
Sent: Tuesday, July 30, 2002 8:35 AM
To: ruby-talk@ruby-lang.org
Subject: Re: A very basic tail -f implementation

Ctrl-C to exit

while true
puts t.read
end

I have an implementation of tail -f that looks similar, except it
doesn’t seek to the end of file at start, and it’s called like this:

Well, it looks like Florian Frank went and released his own version of tail
today.

Coincidence? Ich denke nicht.

I’ll try it out I guess. No documentation…sigh.

Regards,

Dan

···

On Tue, Jul 30, 2002 at 05:15:25AM +0900, Daniel Berger wrote:

Well, it looks like Florian Frank went and released his own version of
tail today.

Coincidence? Ich denke nicht.

This is funny: I don’t know if it is a coincidence. I’ve released it
because someone (rubyhacker was his nick IIRC) asked on the openproject
IRC channel for an implementation of a File::Tail module similiar to the
perl module. I was AFK, so I couldn’t immediately answer him and he left
the channel before I was back. If you are rubyhacker, than it’s no
coincidence at all. If you aren’t rubyhacker, it’s coincidental.

I’ve searched for a File::Tail module when I tried to upgrade my old
ipchains logfile prettifier (written in perl) to iptables logfiles a
while ago, because I wanted to port it to ruby at this opportunity. I
found nothing and so I implemented my own module. Since then it was
lying around on my hard disk. I obviously was to lazy to bring it into a
releasable form for quite a long time. :slight_smile:

I’ll try it out I guess. No documentation…sigh.

I could try to document a little more in the best english I’m capable
of. That is, it will probably be quite awful. :wink:

···

Am 2002-07-31 06:15:08 +0900 schrieb Berger, Daniel:


It is of course always best to be led by god, and have him personally whisper
into your ear. Only, when it is the devil talking he will tell you he is god,
for the devil is a crafty liar. So you never know who is talking to you.
– Franz Bibfeldt

A couple of suggestions:

- Check for rotation by checking for changes in the inode number

- Allow tailing from arbitrary points in the file by lines, e.g

	log.wind(10)    # skip 10 lines from the beginning, print
			# the rest of the file, then tail -f
	log.rewind(-10) # print last 10 lines, then tail -f

Something like this:

def wind(lines)

    seek(0, IO::SEEK_SET)

    numlines = 0
    0.upto(stat.size) { |filepos|
        seek(filepos, IO::SEEK_SET)
        return if (numlines == lines)
        i = getc
        c = sprintf("%c", i)
        numlines += 1 if (c == "\n")
    }

end

def rewind(lines)

    seek(0, IO::SEEK_END)

    lines = lines.abs
    numlines = 0
    size = stat.size - 1

    size.downto(0) { |filepos|
        next if (size == filepos)
        seek(filepos, IO::SEEK_SET)
        i = getc
        c = sprintf("%c", i)
        numlines += 1 if (c == "\n")
        return if (numlines == lines)
    }
end
···

On Wed, 31 Jul 2002 21:02:04 +0900 Florian Frank flori@eavesdrop.ping.de wrote:

I could try to document a little more in the best english I’m capable
of. That is, it will probably be quite awful. :wink:


Jim Hranicky, Senior SysAdmin UF/CISE Department |
E314D CSE Building Phone (352) 392-1499 |
jfh@cise.ufl.edu http://www.cise.ufl.edu/~jfh |


“Given a choice between a complex, difficult-to-understand, disconcerting
explanation and a simplistic, comforting one, many prefer simplistic
comfort if it’s remotely plausible, especially if it involves blaming
someone else for their problems.”
– Bob Lewis, Infoworld

I could try to document a little more in the best english I’m capable
of. That is, it will probably be quite awful. :wink:

A couple of suggestions:

  • Check for rotation by checking for changes in the inode number

Good idea. This should handle the rotation-by-moving case much faster.
I have implemented this.

BTW: If the filesize suddenly shrinks, copy and truncate could have
happened. I’m not sure what has to be done in this case. Rewinding to
the top of the file would perhaps be reasonable, because it doesn’t make
much sense for a logfile to be truncated to any other filesize but 0.

  • Allow tailing from arbitrary points in the file by lines, e.g

    log.wind(10) # skip 10 lines from the beginning, print
    # the rest of the file, then tail -f
    log.rewind(-10) # print last 10 lines, then tail -f

In my implementation the latter is done by log.last(10).

Something like this:

def wind(lines)

    seek(0, IO::SEEK_SET)

    numlines = 0
    0.upto(stat.size) { |filepos|
        seek(filepos, IO::SEEK_SET)
        return if (numlines == lines)
        i = getc
        c = sprintf("%c", i)
        numlines += 1 if (c == "\n")
    }

end

Maybe I am missing somtehing, but couldn’t this be done much simpler
like this:

def wind(lines)

@fileh.seek(0, IO::SEEK_SET) # just to be sure

until @fileh.eof? or lines <= 0
	@fileh.readline
	lines -= 1
end

end

def rewind(lines)

    seek(0, IO::SEEK_END)

    lines = lines.abs
    numlines = 0
    size = stat.size - 1

    size.downto(0) { |filepos|
        next if (size == filepos)
        seek(filepos, IO::SEEK_SET)
        i = getc
        c = sprintf("%c", i)
        numlines += 1 if (c == "\n")
        return if (numlines == lines)
    }
end

My approach is much more complicated because I use a buffer of an
arbitrary size, to spare some seek-calls and to have fewer
explicit iterations. Perhaps it doesn’t make much of a difference and I
could use this simpler method. I should probably benchmark both methods
to find this out.

···

On Wed, 2002-07-31 at 22:50, James F.Hranicky wrote:

On Wed, 31 Jul 2002 21:02:04 +0900 > Florian Frank flori@eavesdrop.ping.de wrote:

def rewind(lines)

    seek(0, IO::SEEK_END)

    lines = lines.abs
    numlines = 0
    size = stat.size - 1

    size.downto(0) { |filepos|
        next if (size == filepos)
        seek(filepos, IO::SEEK_SET)
        i = getc
        c = sprintf("%c", i)
        numlines += 1 if (c == "\n")

Could this be replaced with:
numlines += 1 if getc() == ?\n

        return if (numlines == lines)
    }
end

Paul

···

On Thu, Aug 01, 2002 at 05:50:17AM +0900, James F.Hranicky wrote:

One more suggestion: checkpoint:

log.next { |line|
    process(line)
    log.checkpoint	# write out inode and filepos to a file
			# or something
}

Something like

def checkpoint
    open_checkpoint_file
    write_inode_and_pos
    sync_checkpoint_file
    close_checkpoint_file
end

So, after a crash or reboot, you could start back up where you
left off, unless the inode has changed, and then you’d start at
the beginning of the new file.

Jim

···

On Thu, 1 Aug 2002 05:50:17 +0900 James F.Hranicky jfh@cise.ufl.edu wrote:

On Wed, 31 Jul 2002 21:02:04 +0900 > Florian Frank flori@eavesdrop.ping.de wrote:

I could try to document a little more in the best english I’m capable
of. That is, it will probably be quite awful. :wink:

A couple of suggestions:

BTW: If the filesize suddenly shrinks, copy and truncate could have
happened. I’m not sure what has to be done in this case. Rewinding to
the top of the file would perhaps be reasonable, because it doesn’t make
much sense for a logfile to be truncated to any other filesize but 0.

This makes sense to me.

def wind(lines)

[ … ]

Maybe I am missing somtehing, but couldn’t this be done much simpler
like this:

def wind(lines)

[use readlines]

end
end

No, that’s much better. I was thinking in terms of doing the opposite of
what I did for rewind, missing the easier solution.

def rewind(lines)

[ … ]

end

My approach is much more complicated because I use a buffer of an
arbitrary size, to spare some seek-calls and to have fewer
explicit iterations. Perhaps it doesn’t make much of a difference and I
could use this simpler method. I should probably benchmark both methods
to find this out.

If you want to tail beginning at an arbitrary position in the file,
that will work, but many will probably want to specify the # of lines
from the end.

You could seek to the end, then seek backwards in chunks, read in each
chunk, then count backwards through the chunk counting newlines and
keeping track of filepos, and once you hit the # lines you want, seek to
that position and then read from there. This would cut down on the #
of seeks and reads in my method above, probably resulting in much
better performance.

Jim

···

On Fri, 2 Aug 2002 07:43:57 +0900 Florian Frank flori@eavesdrop.ping.de wrote:

Hmmm…I haven’t run across “?\n” before – what does that do?

Jim

···

On Fri, 2 Aug 2002 22:55:02 +0900 Paul Brannan pbrannan@atdesk.com wrote:

On Thu, Aug 01, 2002 at 05:50:17AM +0900, James F.Hranicky wrote:

        i = getc
        c = sprintf("%c", i)
        numlines += 1 if (c == "\n")

Could this be replaced with:
numlines += 1 if getc() == ?\n

“James F.Hranicky” jfh@cise.ufl.edu writes:

Hmmm…I haven’t run across “?\n” before – what does that do?

It returns the character code for the following character:

irb(main):005:0> ?A
65
irb(main):006:0> ?B
66
irb(main):007:0> ?C
67

···


Josh Huber

James F.Hranicky wrote:

Hmmm…I haven’t run across “?\n” before – what does that do?

The “?” operator returns the ASCII code of the following character:

?\n --> 10
?a --> 97
?A --> 65

HTH,

Lyle

If you want to tail beginning at an arbitrary position in the file,
that will work, but many will probably want to specify the # of lines
from the end.

Yes, of course. I was referring to my implementation in file/tail:

    def last(n = 0, bufsize = 4096)
        if n <= 0
            seek(0, File::SEEK_END)
            return
        end
        size = stat.size
        begin
            if bufsize < size
                seek(0, File::SEEK_END)
                while n > 0 and tell > 0 do
                    start = tell
                    seek(-bufsize, File::SEEK_CUR)
                    buffer = read(bufsize)
                    n -= buffer.count("\n")
                    seek(-bufsize, File::SEEK_CUR)
                end
            else
                seek(0, File::SEEK_SET)
                buffer = read(size)
                n -= buffer.count("\n")
                seek(0, File::SEEK_SET)
            end
        rescue Errno::EINVAL
            size = tell
            retry
        end 
        pos = -1
        while pos and n < 0 # forward if we are too far back
            pos = buffer.index("\n", pos + 1)
            n += 1
        end
        seek(pos + 1, File::SEEK_CUR)
    end

I’m using buffer.count(“\n”) to count all the newlines in a buffer. I
didn’t want to reverse the string first, because this would not be very
performant either. So I search forward in the buffer to find
the right newline in the last while-loop, if I am too far back in the
file.

You could seek to the end, then seek backwards in chunks, read in each
chunk, then count backwards through the chunk counting newlines and
keeping track of filepos, and once you hit the # lines you want, seek to
that position and then read from there. This would cut down on the #
of seeks and reads in my method above, probably resulting in much
better performance.

Yes. This is pretty similar to my implementation above. I think one
bottleneck in scripting languages exists if you copy lots of data
between the scripting level and the c-level. To do most of the things on
the c-level and then copy the results back at the end is usually much
faster. That’s (and to spare a lot of method calls) why I used count
instead of buffer. Perhaps I should waste a few rindex calls to
search the buffer backwards because it probably doesn’t make much of
difference in practice.

···

On Fri, 2002-08-02 at 16:46, James F.Hranicky wrote:

Then it’s much better :->

Jim

···

On Fri, 2 Aug 2002 23:56:44 +0900 Josh Huber huber+dated+1028732200.6ccc24@alum.wpi.edu wrote:

Hmmm…I haven’t run across “?\n” before – what does that do?

It returns the character code for the following character: