File read progress

Hopefully this is an easy one...

I'm processing a very large file into a hash and would like to have a status
message telling me how far along it is (i.e. 30% processed...). What I
would like to do is:

1) find out how many times a keyword is found in this file (since this is a
big file I don't want to have to iterate twice, once for total keywords and
once for storing the records associated with them), or
2) query the filehandle to tell me where I am at with respect to the total
file size.

Any ideas?
JG

As long as your file read routine is written to take a hash, then you
should have no problem. I recommend using progressbar.rb on RAA to
make your progress bar, though.

Note: I'm pretty sure that 0.8.0 isn't the ideal version, because I
think it added some Windows incompatibilities; I need to check and
send a patch to the maintainter.

-austin

···

On Mon, 30 Aug 2004 03:55:28 +0900, JG <perfyct@yahoo.com> wrote:

Hopefully this is an easy one...

I'm processing a very large file into a hash and would like to have a status
message telling me how far along it is (i.e. 30% processed...). What I
would like to do is:

1) find out how many times a keyword is found in this file (since this is a
big file I don't want to have to iterate twice, once for total keywords and
once for storing the records associated with them), or
2) query the filehandle to tell me where I am at with respect to the total
file size.

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

JG wrote:

2) query the filehandle to tell me where I am at with respect to the total
file size.

See the pos method in IO.

As long as your file read routine is written to take a hash, then you
should have no problem. I recommend using progressbar.rb on RAA to
make your progress bar, though.

Note: I'm pretty sure that 0.8.0 isn't the ideal version, because I
think it added some Windows incompatibilities; I need to check and
send a patch to the maintainter.

I had a little more luck with libpbar-ruby (also on RAA). They both are
fairly similar, but sometimes progressbar.rb would crash with stack
level too deep error. I didn't have time to debug it, though.

···

--
Zachary P. Landau <kapheine@hypa.net>
GPG: gpg --recv-key 0x24E5AD99 | http://kapheine.hypa.net/kapheine.asc

"Tim Hunter" <cyclists@nc.rr.com> schrieb im Newsbeitrag
news:PXqYc.548$MO3.87112@twister.southeast.rr.com...

JG wrote:

> 2) query the filehandle to tell me where I am at with respect to the

total

> file size.

See the pos method in IO.

#pos will only help if you know the number of lines beforehand. That's
typically not the case. How about this:

def read_records(file, rx, report = $stderr)
  size = File.size file

  File.open( file ) do |io|
    rec =
    sum = 0
    threshold = 10

    while ( line = io.gets )
      sum += line.size
      line.chomp!

      perc = sum * 100.0 / size

      if perc > threshold
        report.printf "Read %.1f%\n", perc
        threshold += 10
      end

      rec << line if rx =~ line
    end

    return rec
  end
end

Then you can do

records = read_records("foo.bar", /keyword/)
# eat the status info:
records = read_records("foo.bar", /keyword/, StringIO.new)

Kind regards

    robert