Seraching backward in files

Dear ruby hackers,

I’d wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

  1. goto the end of file (pdffile.seek(0,IO::SEEK_END))
  2. search backward for the string “%%EOF” within the last 1024 bytes
  3. search backward for “startxref”
  4. read the next bytes,…

Do I have to write my own seach backward routine or is there some
functionality for it?

Best regards,

Patrick

Patrick Gundlach wrote:

I’d wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

  1. goto the end of file (pdffile.seek(0,IO::SEEK_END))
  2. search backward for the string “%%EOF” within the last 1024 bytes
  3. search backward for “startxref”
  4. read the next bytes,…

Do I have to write my own seach backward routine or is there some
functionality for it?

Ruby’s native regexp engine cannot go backwards… thats why I
am working on my own regexp-engine. However I don’t have a
FileInputStream class yet (but I can easily add one).

The text is spelled backwards, because we want to search backwards.

pseudo code

iterator = file.create_iterator_end.reverse
re = NewRegexp.new(‘(?xm) .{0,1024} FOE%% .{0,4000} ferxtrats’)
matchdata = re.match(iterator)

The engine is written entirely in Ruby, so speed isn’t impressive.
Its fairly compatible with Ruby1.8’s builtin GNU regexp, and its
carefully tested (more than 2000 tests).
http://raa.ruby-lang.org/list.rhtml?name=regexp

···


Simon Strandgaard

[Patrick Gundlach clr1.10.randomuser@spamgourmet.com, 2004-05-20 13.03 CEST]

Dear ruby hackers,

I’d wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

  1. goto the end of file (pdffile.seek(0,IO::SEEK_END))
  2. search backward for the string “%%EOF” within the last 1024 bytes
  3. search backward for “startxref”
  4. read the next bytes,…

Do I have to write my own seach backward routine or is there some
functionality for it?

You can load the last 1024 characters in a string and use String#rindex for
(2). Then, use String#rindex again to find (3). If you don’t find, load the
second-to-last 1024 characters, use String#rindex again, etc…

Good luck.

···

if you are on linux you might want to simply

tac = IO.popen “tac #{ path }”

process file in ‘normal’ fashion - only backwards…

man 1 tac

cheers.

-a

···

On Thu, 20 May 2004, Patrick Gundlach wrote:

Dear ruby hackers,

I’d wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

  1. goto the end of file (pdffile.seek(0,IO::SEEK_END))
  2. search backward for the string “%%EOF” within the last 1024 bytes
  3. search backward for “startxref”
  4. read the next bytes,…

Do I have to write my own seach backward routine or is there some
functionality for it?

Best regards,

Patrick

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
URL :: Solar-Terrestrial Physics Data | NCEI
“640K ought to be enough for anybody.” - Bill Gates, 1981
===============================================================================

Hi,

  1. search backward for the string “%%EOF” within the last 1024 bytes
  2. search backward for “startxref”

Carlos angus@quovadis.com.ar writes:

You can load the last 1024 characters in a string and use String#rindex for
(2). Then, use String#rindex again to find (3). If you don’t find, load the
second-to-last 1024 characters, use String#rindex again, etc…

Yes, this is cool. It works like charm. Thanks for pointing this out.

Patrick

Hi,

your solution would probably the most clean one, but would require an
additional library. I think I’ll go for IO->String, String#rindex.

Thank you for your answer,

Patrick

“Ara.T.Howard” ahoward@noaa.gov writes:

if you are on linux you might want to simply

tac = IO.popen “tac #{ path }”

process file in ‘normal’ fashion - only backwards…

Oh, right, I forgot about tac. But this could be slow on large files
(and non portable).

Thank you for your answer,

Patrick

Good luck.

BTW: I have just made an experimental File iterator.

···

Patrick Gundlach clr1.10.randomuser@spamgourmet.com wrote:

your solution would probably the most clean one, but would require an
additional library. I think I’ll go for IO->String, String#rindex.


Simon Strandgaard