How do I quickly search the end of a huge text file?

I am trying to create a ruby script that will search a maya ascii file
for specific text. The problem I'm running into is that it's running to
slow for the system at work. I know that all the information I need is
in the last 5% of the text file - but I haven't been able to figure out
a way to either jump to near the end, and then start search through
lines or even better iterating backwards through the file till I find
what I'm looking for... here is the code I'm currently using - which
works but slowly - any suggestions for how to speed this up would be
greatly appreciated! :smiley:

require "FileUtils"
require "ftools"

def FindRenderLayers (root)
layersFile = []
dirLocation = root.gsub(/(\\)$/, '')
list = Dir.entries(dirLocation)

  list.each do |file|
    if file =~ /\.ma$/
       fileName = root + file
       layersFile.push file
       File.open(fileName) do |file|
        while line = file.gets
          if line =~ /connectAttr
(\"renderLayerManager.rlmi\[[0-9]\]\")/
            if $1 != "defaultRenderLayer"
              editedLine = "-" + $1
              layersFile.push editedLine
            end
          end
        end
      end
    end
  end
  return layersFile
end

root = "C:\\Users\\Brian\\Documents\\Ruby\\"
puts FindRenderLayers(root)

···

--
Posted via http://www.ruby-forum.com/.

IO::SEEK_END at Ruby-Doc may be the ticket...
http://www.ruby-doc.org/core/classes/IO.html#M002305

···

On Thu, Sep 4, 2008 at 8:51 PM, Brian Green <gallagherjb@gmail.com> wrote:

I am trying to create a ruby script that will search a maya ascii file
for specific text. The problem I'm running into is that it's running to
slow for the system at work. I know that all the information I need is
in the last 5% of the text file - but I haven't been able to figure out
a way to either jump to near the end, and then start search through
lines or even better iterating backwards through the file till I find
what I'm looking for... here is the code I'm currently using - which
works but slowly - any suggestions for how to speed this up would be
greatly appreciated! :smiley:

require "FileUtils"
require "ftools"

def FindRenderLayers (root)
layersFile =
dirLocation = root.gsub(/(\\)$/, '')
list = Dir.entries(dirLocation)

list.each do |file|
   if file =~ /\.ma$/
      fileName = root + file
      layersFile.push file
      File.open(fileName) do |file|
       while line = file.gets
         if line =~ /connectAttr
(\"renderLayerManager.rlmi\[[0-9]\]\")/
           if $1 != "defaultRenderLayer"
             editedLine = "-" + $1
             layersFile.push editedLine
           end
         end
       end
     end
   end
end
return layersFile
end

root = "C:\\Users\\Brian\\Documents\\Ruby\\"
puts FindRenderLayers(root)
--
Posted via http://www.ruby-forum.com/\.

# I am trying to create a ruby script that will search a maya ascii file
# for specific text. The problem I'm running into is that it's
# running to slow for the system at work.

why do you say it is slow? what is your comparison? where is your benchmark?
how many files do you have? how large are the files?
how much disk space do you have?
how much memory do you have?
how fast is your cpu?

# I know that all the information I need is
# in the last 5% of the text file - but I haven't been able to

are you sure of the 5% ?
where is your proof?

# figure out a way to either jump to near the end, and then
# start search through lines

low level, use IO:SEEK_END

# or even better iterating backwards through the file till I find
# what I'm looking for...

arggh. but your comparison will be forward. otherwise, you'll have to reverse your search/regex pattern. implement a reverse readline/gets.

# here is the code I'm currently using - which
# works

are you sure it works? see my comment below, inline of your code.

# but slowly - any suggestions for how to speed this up would be
# greatly appreciated! :smiley:

···

From: Brian Green [mailto:gallagherjb@gmail.com]
#
# require "FileUtils"
# require "ftools"
#
# def FindRenderLayers (root)
# layersFile = []
# dirLocation = root.gsub(/(\\)$/, '')
# list = Dir.entries(dirLocation)
#
# list.each do |file|
# if file =~ /\.ma$/
# fileName = root + file
# layersFile.push file
# File.open(fileName) do |file|
# while line = file.gets
# if line =~ /connectAttr
#(\"renderLayerManager.rlmi\[[0-9]\]\")/
# if $1 != "defaultRenderLayer"

pls forgive me at this point because i am at a lost

1. how could $1, which is patterned after \"renderLayerManager.rlmi\[[0-9]\]\", be ever be equal to "defaultRenderLayer" ??

2. and besides why need to compare again, if you can ask it straight from your regex comparison?

# editedLine = "-" + $1
# layersFile.push editedLine
# end
# end
# end
# end
# end
# end
# return layersFile
# end
#
# root = "C:\\Users\\Brian\\Documents\\Ruby\\"
# puts FindRenderLayers(root)

kind regards -botp

Victor Goff wrote:

IO::SEEK_END at Ruby-Doc may be the ticket...
http://www.ruby-doc.org/core/classes/IO.html#M002305

Thanks for your input... I actually tried using SEEK_END - couldn't get
it to work right...

···

--
Posted via http://www.ruby-forum.com/\.

Peña, Botp wrote:

From: Brian Green [mailto:gallagherjb@gmail.com]
# I am trying to create a ruby script that will search a maya ascii file
# for specific text. The problem I'm running into is that it's
# running to slow for the system at work.

why do you say it is slow? what is your comparison? where is your
benchmark?
how many files do you have? how large are the files?
how much disk space do you have?
how much memory do you have?
how fast is your cpu?

It's slow because the script is going to integrated into the companies
online asset management software - and I was told by the IT guys that if
it's slower than a certain speed it will time out - it currently is too
slow.

As far as how many files it ranges between 3-5 (usually), the sizes of
the files vary from about 5MB-50MB

Disk space is not an issue - there's tons of it. As far memory goes -
the IT guys said it can't load the whole file into memory.

CPU is fairly fast - but again this isn't the problem - since it will be
running from a server...

# I know that all the information I need is
# in the last 5% of the text file - but I haven't been able to

are you sure of the 5% ?
where is your proof?

I've gone through many files and manually located where the text I'm
looking for appears - they appear no further out that 5% from the end...

# figure out a way to either jump to near the end, and then
# start search through lines

low level, use IO:SEEK_END

I'm not sure how to use the SEEK_END properly and it's hard finding good
examples...

# or even better iterating backwards through the file till I find
# what I'm looking for...

arggh. but your comparison will be forward. otherwise, you'll have to
reverse your search/regex pattern. implement a reverse readline/gets.

That sounds good how do I do that?

# here is the code I'm currently using - which
# works

are you sure it works? see my comment below, inline of your code.

# but slowly - any suggestions for how to speed this up would be
# greatly appreciated! :smiley:
#
# require "FileUtils"
# require "ftools"
#
# def FindRenderLayers (root)
# layersFile =
# dirLocation = root.gsub(/(\\)$/, '')
# list = Dir.entries(dirLocation)
#
# list.each do |file|
# if file =~ /\.ma$/
# fileName = root + file
# layersFile.push file
# File.open(fileName) do |file|
# while line = file.gets
# if line =~ /connectAttr
#(\"renderLayerManager.rlmi\[[0-9]\]\")/
# if $1 != "defaultRenderLayer"

pls forgive me at this point because i am at a lost

1. how could $1, which is patterned after
\"renderLayerManager.rlmi\[[0-9]\]\", be ever be equal to
"defaultRenderLayer" ??

Sorry - yeah that's not needed - had it a while ago and forgot to erase
it.

2. and besides why need to compare again, if you can ask it straight
from your regex comparison?

You're right...

···

# editedLine = "-" + $1
# layersFile.push editedLine
# end
# end
# end
# end
# end
# end
# return layersFile
# end
#
# root = "C:\\Users\\Brian\\Documents\\Ruby\\"
# puts FindRenderLayers(root)

kind regards -botp

--
Posted via http://www.ruby-forum.com/\.

# It's slow because the script is going to integrated into the
# companies
# online asset management software - and I was told by the IT
# guys that if
# it's slower than a certain speed it will time out - it
# currently is too slow.

···

From: Brian Green [mailto:gallagherjb@gmail.com]
#
# As far as how many files it ranges between 3-5 (usually), the
# sizes of
# the files vary from about 5MB-50MB

max of 5 * 50MB
not so bad if you have lots of ram

# Disk space is not an issue - there's tons of it. As far memory goes -
# the IT guys said it can't load the whole file into memory.
# CPU is fairly fast - but again this isn't the problem - since
# it will be running from a server...
#
# >
# > # I know that all the information I need is
# > # in the last 5% of the text file - but I haven't been able to
# >
# > are you sure of the 5% ?
# > where is your proof?
#
# I've gone through many files and manually located where the text I'm
# looking for appears - they appear no further out that 5% from
# the end...

ok no problem. we can adjust it anytime :wink:

# > # figure out a way to either jump to near the end, and then
# > # start search through lines
# >
# > low level, use IO:SEEK_END
#
# I'm not sure how to use the SEEK_END properly and it's hard
# finding good examples...

the examples are clear enough. try it first for one file. then post your tried codes again here.

kind regards -botp

Here is an usage example :

begin
  file = File.open(ARGV[0])
rescue
  puts "file does not exist or is not a file\n"
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .

···

--
Posted via http://www.ruby-forum.com/.

Lex Williams wrote:

Here is an usage example :

begin
  file = File.open(ARGV[0])
rescue
  puts "file does not exist or is not a file\n"
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .

I meant the rest of the lines . Sorry .

···

--
Posted via http://www.ruby-forum.com/\.

Thank you very much!! That's exactly what I was looking for!

I just added

file.seek(-2000,IO::SEEK_END)

right after the line

fileSize = File.size(fileName)

and it worked perfectly! It's running about 18x faster - which is a huge
improvement - I think the guys at work will be satisifed with it's speed
now!

Thanks again Lex!! :smiley:

Lex Williams wrote:

···

Lex Williams wrote:

Here is an usage example :

begin
  file = File.open(ARGV[0])
rescue
  puts "file does not exist or is not a file\n"
end

file.seek(-25,IO::SEEK_END)
puts file.readlines

The code will read the rest of the files from that location . Try it on
a file and see .

I meant the rest of the lines . Sorry .

--
Posted via http://www.ruby-forum.com/\.

if i'm not mistaken, that would be

   fileSize = File.size(fileName)
   file.seek(-0.05*fileSize, IO::SEEK_END)

···

On Fri, Sep 5, 2008 at 7:32 PM, Brian Green <gallagherjb@gmail.com> wrote:

I just added
file.seek(-2000,IO::SEEK_END)
right after the line
fileSize = File.size(fileName)

Brian Green wrote:

Thank you very much!! That's exactly what I was looking for!

I just added

file.seek(-2000,IO::SEEK_END)

right after the line

fileSize = File.size(fileName)

50 megabytes = 52 428 800 bytes
5% = 52428800 * .05 = 2621440
2621440 != 2000

perhaps:
fileSize = File.size(fileName)
seeklen = ((0.05 * fileSize) * -1).to_i
file = File.open(ARGV[0)
file.seek(seeklen, IO::SEEK_END)
puts file.readlines

···

--
Posted via http://www.ruby-forum.com/\.