How do I reduce the memory usage of a script?

Hi, all.

Please find attached a simple Ruby script that rummages through my
ITunes files, reads the first megabyte or so, finds the encoder, and
then prints the encoder and filename. This lets me know which tracks
need re-ripping.

This script blows through half a gig of RAM while running, and I really
do not see why. It should only have perhaps a few megabytes at max in
RAM.

FWIW, the output looks like:
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song1.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song2.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song3.m4a

Style and speed optimizations are accepted, but the runtime is under a
minute now for the 5500 files I have in my library, so memory usage is
my real problem.

Help?

#!/usr/bin/env ruby
require 'find'
def procpath(f)
   if File.file?(f) then
      if File.fnmatch("*.m4a",f) then
         found = false
         data = IO.read(f, 65536*8)
         re = /[[:alnum:]_., ]{9,}/
         data.scan(re) do |string|
            if (string =~ /QuickTime/) then
               filename = File.basename(f)
               dirname = File.dirname(f)
# puts "#{string} #{dirname}"
              puts "#{string} #{dirname} #{filename}"
               found = true
               break
            end
         end
         if (!found) then
            puts "Unknown #{f}"
         end
      end
   elsif File.directory?(f) && !File.fnmatch(".", f) &&
!File.fnmatch("..", f) then
      Dir.foreach(f) { |subf| procpath(subf) }
   end
end

Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
   procpath(f)
end

Scott

···

--
scott@alodar.nospam.com
Java, Cocoa, and Database consulting for the life sciences

--
Scott Ellsworth
scott@alodar.nospam.com
Java and database consulting for the life sciences

#!/usr/bin/env ruby
require 'find'
def procpath(f)
  if File.file?(f) then
     if File.fnmatch("*.m4a",f) then
        found = false
        data = IO.read(f, 65536*8)
        re = /[[:alnum:]_., ]{9,}/
        data.scan(re) do |string|
           if (string =~ /QuickTime/) then
              filename = File.basename(f)
              dirname = File.dirname(f)
# puts "#{string} #{dirname}"
             puts "#{string} #{dirname} #{filename}"
              found = true
              break
           end
        end
        if (!found) then
           puts "Unknown #{f}"
        end
     end
  elsif File.directory?(f) && !File.fnmatch(".", f) &&
!File.fnmatch("..", f) then
     Dir.foreach(f) { |subf| procpath(subf) }

Why are you recursing here? Find.find does this stuff for you!

  end
end

Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
  procpath(f)
end

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong later."

From this principle, all of life and physics may be deduced.

···

On Thu, 14 Jul 2005, Scott Ellsworth wrote:

Well mileage may vary and all that jazz, but on my box it took up like ~30M virtual according to top and like 1.5MB ~ 2MB physical. Have you tried explicity invoking the GC?

···

On Jul 13, 2005, at 5:30 PM, Scott Ellsworth wrote:

Hi, all.

Please find attached a simple Ruby script that rummages through my
ITunes files, reads the first megabyte or so, finds the encoder, and
then prints the encoder and filename. This lets me know which tracks
need re-ripping.

This script blows through half a gig of RAM while running, and I really
do not see why. It should only have perhaps a few megabytes at max in
RAM.

FWIW, the output looks like:
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song1.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song2.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song3.m4a

Style and speed optimizations are accepted, but the runtime is under a
minute now for the 5500 files I have in my library, so memory usage is
my real problem.

Help?

#!/usr/bin/env ruby
require 'find'
def procpath(f)
   if File.file?(f) then
      if File.fnmatch("*.m4a",f) then
         found = false
         data = IO.read(f, 65536*8)
         re = /[[:alnum:]_., ]{9,}/
         data.scan(re) do |string|
            if (string =~ /QuickTime/) then
               filename = File.basename(f)
               dirname = File.dirname(f)
# puts "#{string} #{dirname}"
              puts "#{string} #{dirname} #{filename}"
               found = true
               break
            end
         end
         if (!found) then
            puts "Unknown #{f}"
         end
      end
   elsif File.directory?(f) && !File.fnmatch(".", f) &&
!File.fnmatch("..", f) then
      Dir.foreach(f) { |subf| procpath(subf) }
   end
end

Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
   procpath(f)
end

Scott
--
scott@alodar.nospam.com
Java, Cocoa, and Database consulting for the life sciences

--
Scott Ellsworth
scott@alodar.nospam.com
Java and database consulting for the life sciences

Scott Ellsworth wrote:

Hi, all.

[...]

This script blows through half a gig of RAM while running, and I really
do not see why. It should only have perhaps a few megabytes at max in
RAM.

[...]

         if (!found) then
            puts "Unknown #{f}"

           else
             data = nil
             GC.start # garbage collect

         end

Any better with that addition ?

daz

(Called away from keyboard)

Compare last with:

          if (!found) then
             puts "Unknown #{f}"
          end
          data = nil
          GC.start # garbage collect

.... which will garbage collect more often.

Best,

daz

In article <8HKdneMVh9MZDUjfSa8jmA@karoo.co.uk>,

···

"daz" <dooby@d10.karoo.co.uk> wrote:

          if (!found) then
             puts "Unknown #{f}"
          end
          data = nil
          GC.start # garbage collect

This did seem to drop the memory usage on my MacOS X 10.4.2 system.

I will investigate the Find.find command next to see if I can get rid of
some recursion. An array of 5500 paths should not be _that_ big, at
least in comparison with four or five levels of directory depth.

Scott

--
Scott Ellsworth
scott@alodar.nospam.com
Java and database consulting for the life sciences

Scott Ellsworth wrote:

In article <8HKdneMVh9MZDUjfSa8jmA@karoo.co.uk>,

          if (!found) then
             puts "Unknown #{f}"
          end
          data = nil
          GC.start # garbage collect

This did seem to drop the memory usage on my MacOS X 10.4.2 system.

I will investigate the Find.find command next to see if I can get rid
of some recursion. An array of 5500 paths should not be _that_ big,
at least in comparison with four or five levels of directory depth.

The problem might be that the data is still around while you enter the
recursion. If you want to verify that this is the case you can simply do
data = nil after processing. But: You definitely need to throw out the
recursion from propath() - otherwise you'll be processing directories over
and over again (I smell something like O(n*n) here)!

Kind regards

    robert

···

"daz" <dooby@d10.karoo.co.uk> wrote:

In article <3jmmgnFqqsq6U1@individual.net>,

Scott Ellsworth wrote:
> In article <8HKdneMVh9MZDUjfSa8jmA@karoo.co.uk>,
>
>> if (!found) then
>> puts "Unknown #{f}"
>> end
>> data = nil
>> GC.start # garbage collect
>
> This did seem to drop the memory usage on my MacOS X 10.4.2 system.
>
> I will investigate the Find.find command next to see if I can get rid
> of some recursion. An array of 5500 paths should not be _that_ big,
> at least in comparison with four or five levels of directory depth.

The problem might be that the data is still around while you enter the
recursion. If you want to verify that this is the case you can simply do
data = nil after processing. But: You definitely need to throw out the
recursion from propath() - otherwise you'll be processing directories over
and over again (I smell something like O(n*n) here)!

I have removed the recursion - see below.

A question, though, Is the String.scan method I used the best way to do
the scan this block of data? Every file is going to contain the string
'QuickTime' somewhere in the first few MB, and I want from the last
nonprintable character before it to the next nonprintable character
after. I only need to read from disk until I find that string, and once
I find it, I need only the bytes before, plus a version number
afterwards. I certainly do not need to manipulate more than a few
hundred characters around that magic string, and once I have read, I do
not need to go back.

NB - nonprintable here is defined as [[:alnum:]_., ]

work@boggle:Desktop$ time ./detectEncoding.rb > songs.txt

real 3m30.563s
user 0m26.229s
sys 0m23.746s

New code:

#!/usr/bin/env ruby
require 'find'
re = /[[:alnum:]_., ]{9,}/
Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
   if File.file?(f) && File.fnmatch("*.m4a",f) then
      found = false
      data = IO.read(f, 65536*8)
      data.scan(re) do |string|
         if (string =~ /QuickTime/) then
            filename = File.basename(f)
            dirname = File.dirname(f)
            puts "#{string} #{dirname}"
# puts "#{string} #{dirname} #{filename}"
            found = true
            break
         end
      end
      if (!found) then
         puts "Unknown #{f}"
      end
      data = nil
      GC.start # garbage collect
   end
end

Scott

···

"Robert Klemme" <bob.news@gmx.net> wrote:

> "daz" <dooby@d10.karoo.co.uk> wrote:

--
Scott Ellsworth
scott@alodar.nospam.com
Java and database consulting for the life sciences

In article <3jmmgnFqqsq6U1@individual.net>,

Scott Ellsworth wrote:

In article <8HKdneMVh9MZDUjfSa8jmA@karoo.co.uk>,

          if (!found) then
             puts "Unknown #{f}"
          end
          data = nil
          GC.start # garbage collect

This did seem to drop the memory usage on my MacOS X 10.4.2 system.

I will investigate the Find.find command next to see if I can get
rid of some recursion. An array of 5500 paths should not be _that_
big, at least in comparison with four or five levels of directory
depth.

The problem might be that the data is still around while you enter
the recursion. If you want to verify that this is the case you can
simply do data = nil after processing. But: You definitely need to
throw out the recursion from propath() - otherwise you'll be
processing directories over and over again (I smell something like
O(n*n) here)!

I have removed the recursion - see below.

A question, though, Is the String.scan method I used the best way to
do the scan this block of data? Every file is going to contain the
string 'QuickTime' somewhere in the first few MB, and I want from the
last nonprintable character before it to the next nonprintable
character after. I only need to read from disk until I find that
string, and once I find it, I need only the bytes before, plus a
version number afterwards. I certainly do not need to manipulate
more than a few hundred characters around that magic string, and once
I have read, I do not need to go back.

NB - nonprintable here is defined as [[:alnum:]_., ]

The problem with your script is that it does not find "QuickTime" if your chunk reading cuts it in half (or "Q" and "uickTime" - whatever). It might be easier to just slurp in the complete file (depending on size - a few MB are no problem) and then do the scan on the single string. Also, I don't understand why you don't put QuickTime into your search RE.

Kind regards

    robert

···

Scott Ellsworth <scott@alodar.com> wrote:

"Robert Klemme" <bob.news@gmx.net> wrote:

"daz" <dooby@d10.karoo.co.uk> wrote:

work@boggle:Desktop$ time ./detectEncoding.rb > songs.txt

real 3m30.563s
user 0m26.229s
sys 0m23.746s

New code:

#!/usr/bin/env ruby
require 'find'
re = /[[:alnum:]_., ]{9,}/
Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
  if File.file?(f) && File.fnmatch("*.m4a",f) then
     found = false
     data = IO.read(f, 65536*8)
     data.scan(re) do |string|
        if (string =~ /QuickTime/) then
           filename = File.basename(f)
           dirname = File.dirname(f)
           puts "#{string} #{dirname}"
# puts "#{string} #{dirname} #{filename}"
           found = true
           break
        end
     end
     if (!found) then
        puts "Unknown #{f}"
     end
     data = nil
     GC.start # garbage collect
  end
end

Scott