Help me clean up this method

Hello guys,

I wrote this little method to return the size of a given directory, but
I think it's very ugly. Could anyone help me clean it up a bit?

def Dir.size(dirname)
  Dir.chdir(dirname)
  entries = Dir.entries(".").reject { |x| %w(. ..).include? x }
  entries.collect! { |filename| File.expand_path(filename) }
  size = 0
  entries.each do |filename|
    begin
      if File.file?(filename)
        size += File.size(filename) rescue 0
      else
        size += Dir.size(filename)
      end
    rescue
      next
    end
  end
  size
end

Thanks,
Vincent.

File.size(dirname) seems to be working on my system. Am I missing something obvious?

James Edward Gray II

···

On Sep 4, 2005, at 7:46 PM, Vincent Foley wrote:

Hello guys,

I wrote this little method to return the size of a given directory, but
I think it's very ugly. Could anyone help me clean it up a bit?

Hi guys,

I managed to get rid of the file names discovery by using Dir's globbing
facilities. The size calculation is then a matter of a single inject call:

def Dir.size(name)
Dir.chdir(name)
files = Dir["**/*"]
files.inject(0) do |total, name|
if File.file?(name)
total + File.size(name)
else
total
end
end
end

puts Dir.size(".")
puts Dir.size("D:/tmp/ruby")
puts Dir.size("C:/Windows")

I don't like the "if" statement inside the block that gets injected. Is
there a better, idiomatic way to express the same thing?

Hristo Deshev

···

On 9/5/05, Vincent Foley <vfoley@gmail.com> wrote:

Hello guys,

I wrote this little method to return the size of a given directory, but
I think it's very ugly. Could anyone help me clean it up a bit?

require 'find'

def Dir.size(dirname)
   size = 0
   Find.find dirname do |name|
     next unless File.file? name
     size += File.size name rescue 0
   end
   return size
end

···

On 04 Sep 2005, at 17:46, Vincent Foley wrote:

Hello guys,

I wrote this little method to return the size of a given directory, but
I think it's very ugly. Could anyone help me clean it up a bit?

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Vincent Foley wrote:

Hello guys,

I wrote this little method to return the size of a given directory, but
I think it's very ugly. Could anyone help me clean it up a bit?

def Dir.size(name)
  Dir[ name + "/**/*" ].select{|x| File.file?(x)}.inject(0){|sum,f|
    sum + File.size(f) }
end

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered - in the best cases linked files are
counted twice or linked dirs are not checked, in the worst case infinite loops
occur. the methods using 'Dir[glob]' v.s. 'Find::find' suffer from the link
issue but also will perfom badly on large file systems. unfortunately ruby's
built-in 'Find::find' cannot deal with links - for that you have to rely on
motoyuki kasahara's Find2 module, which you can get off of the raa. i have it
inlined in my personal library (alib - also on the raa) with few small bug
fixes and interface additions, to use you would do something like:

   require 'alib'

   def dir_size dir
     size = 0
     totalled = {}
     ALib::Util::find2(dir, 'follow' => true) do |path, stat|
       begin
         next if totalled[stat.ino]
         next unless stat.file?
         size += stat.size
       ensure
         totalled[stat.ino] = true
       end
     end
     size
   end

   p dir_size('.')

this handles huge directories, duplicate files (links) in a directory, linked
directories, and potential infinite loops. i think this is about as simply as
one can write this without introducing subtle, or not so subtle, bugs.

cheers.

-a

···

On Mon, 5 Sep 2005, Vincent Foley wrote:

Hello guys,

I wrote this little method to return the size of a given directory, but
I think it's very ugly. Could anyone help me clean it up a bit?

def Dir.size(dirname)
Dir.chdir(dirname)
entries = Dir.entries(".").reject { |x| %w(. ..).include? x }
entries.collect! { |filename| File.expand_path(filename) }
size = 0
entries.each do |filename|
   begin
     if File.file?(filename)
       size += File.size(filename) rescue 0
     else
       size += Dir.size(filename)
     end
   rescue
     next
   end
end
size
end

Thanks,
Vincent.

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

File.size(dirname) only tells you how many bytes the directory's own
inode is using. It doesn't include the bytes for the directory's
files.

How about this:

def Dir.size(dname)
  Dir.new(dname).inject(File.size(dname)) {|total,name|
    begin
      exname = File.expand_path(name,dname)
      if File.file?(exname)
        total + File.size(exname)
      elsif File.directory?(exname) and name != '.' and name != '..'
        total + Dir.size(exname)
      else
        total
      end
    rescue
      total
    end
  }
end

-Ed

···

On Mon, Sep 05, 2005 at 10:52:18AM +0900, James Edward Gray II wrote:

On Sep 4, 2005, at 7:46 PM, Vincent Foley wrote:
> I wrote this little method to return the size of a given directory,
> but I think it's very ugly. Could anyone help me clean it up a bit?

File.size(dirname) seems to be working on my system. Am I missing
something obvious?

Yes,File.size(dirname) is working.but It only calc the current dir's
size,not include the subdir's

The mailing list critters ate my indentation! Maybe this one will get
through:

def Dir.size(name)
Dir.chdir(name)
files = Dir["**/*"]
files.inject(0) do |total, name|
if File.file?(name)
total + File.size(name)
else
total
end
end
end

Hristo Deshev

···

On 9/5/05, Hristo Deshev <hristo.deshev@gmail.com> wrote:

On 9/5/05, Vincent Foley <vfoley@gmail.com> wrote:
>
> Hello guys,
>
> I wrote this little method to return the size of a given directory, but
> I think it's very ugly. Could anyone help me clean it up a bit?

Hi guys,

I managed to get rid of the file names discovery by using Dir's globbing
facilities. The size calculation is then a matter of a single inject call:

def Dir.size(name)
Dir.chdir(name)
files = Dir["**/*"]
files.inject(0) do |total, name|
if File.file?(name)
total + File.size(name)
else
total
end
end
end

puts Dir.size(".")
puts Dir.size("D:/tmp/ruby")
puts Dir.size("C:/Windows")

I don't like the "if" statement inside the block that gets injected. Is
there a better, idiomatic way to express the same thing?

Hristo Deshev

One more safety addition
def Dir.size(name)
  Dir[ File.join(name, "**/*") ].select{ | f | File.file?(f)
}.inject(0){ | sum, f |
    sum + File.size(f)
  }
end

regards,

Brian

···

On 05/09/05, William James <w_a_x_man@yahoo.com> wrote:

Vincent Foley wrote:
> Hello guys,
>
> I wrote this little method to return the size of a given directory, but
> I think it's very ugly. Could anyone help me clean it up a bit?

def Dir.size(name)
  Dir[ name + "/**/*" ].select{|x| File.file?(x)}.inject(0){|sum,f|
    sum + File.size(f) }
end

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Ara.T.Howard wrote:

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered

If you have links, you also have du, which IMO is the Right Tool For This Job:

total = `du --max-depth=0|cut -f 1`.to_i

-dB

···

--
David Brady
ruby-talk@shinybit.com
C++ Guru. Ruby nuby. Apply salt as needed.

Ara.T.Howard wrote:

> Hello guys,
>
> I wrote this little method to return the size of a given directory, but
> I think it's very ugly. Could anyone help me clean it up a bit?
>

[snip]

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered - in the best cases linked files are
counted twice or linked dirs are not checked, in the worst case infinite loops
occur. the methods using 'Dir[glob]' v.s. 'Find::find' suffer from the link
issue but also will perfom badly on large file systems. unfortunately ruby's
built-in 'Find::find' cannot deal with links - for that you have to rely on
motoyuki kasahara's Find2 module, which you can get off of the raa. i have it
inlined in my personal library (alib - also on the raa) with few small bug
fixes and interface additions, to use you would do something like:

   require 'alib'

   def dir_size dir
     size = 0
     totalled = {}
     ALib::Util::find2(dir, 'follow' => true) do |path, stat|
       begin
         next if totalled[stat.ino]
         next unless stat.file?
         size += stat.size
       ensure
         totalled[stat.ino] = true
       end
     end
     size
   end

   p dir_size('.')

this handles huge directories, duplicate files (links) in a directory, linked
directories, and potential infinite loops. i think this is about as simply as
one can write this without introducing subtle, or not so subtle, bugs.

This solution fails under windoze by always returning 0, apparently
because
File.stat(path).ino always returns 0.

If you're stuck with windoze, use the previously posted

def Dir.size(name)
  Dir[File.join(name, "**/*")].select{|f|
File.file?(f)}.inject(0){|sum,f|
    sum + File.size(f)
  }
end

···

On Mon, 5 Sep 2005, Vincent Foley wrote:

Ara.T.Howard wrote:

<snip>

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered - in the best cases linked files are
counted twice or linked dirs are not checked, in the worst case infinite loops
occur.

For a good summary of the problems with calculating the size of a
directory (on Windows) see
http://blogs.msdn.com/oldnewthing/archive/2004/12/28/336219.aspx\.

Regards,

Dan

David Brady wrote:

Ara.T.Howard wrote:

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered

If you have links, you also have du, which IMO is the Right Tool For This Job:

total = `du --max-depth=0|cut -f 1`.to_i

sorry for stating the obvious, but: this isn't very portable.

cheers

Simon

windows has links - and systems that have du may not have one that supports
max-depth. and du reports on file system blocks - not directory sums. this is
typically close, but can diverge greatly depending on file system setup and the
number of directoriess - since du will report usage for directories too..

   harp:~ > mkdir foobar

   harp:~ > du foobar
   4 foobar

fyi.

-a

···

On Tue, 6 Sep 2005, David Brady wrote:

Ara.T.Howard wrote:

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered

If you have links, you also have du, which IMO is the Right Tool For This Job:

total = `du --max-depth=0|cut -f 1`.to_i

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

works for me :

   Ara@JEN ~
   $ cat a.rb
     require 'alib'

     def dir_size dir
       size = 0
       totalled = {}
       ALib::Util::find2(dir, 'follow' => true) do |path, stat|
         begin
           next if totalled[stat.ino]
           next unless stat.file?
           size += stat.size
         ensure
           totalled[stat.ino] = true
         end
       end
       size
     end

     p dir_size('.')

   Ara@JEN ~
   $ ruby a.rb
   29432845

   Ara@JEN ~
   $ du -sb .
   29432903 .

   Ara@JEN ~
   $ ruby -r rbconfig -r yaml -e'y Config::CONFIG' |egrep -i win
   target: i686-pc-cygwin
   ac_ct_WINDRES: windres
   WINDRES: windres
   archdir: /usr/lib/ruby/1.8/i386-cygwin
   sitearch: i386-cygwin
   arch: i386-cygwin
   host_os: cygwin
   build: i686-pc-cygwin
   host: i686-pc-cygwin
   build_os: cygwin
   target_os: cygwin
   sitearchdir: /usr/lib/ruby/site_ruby/1.8/i386-cygwin

so it seems like your ruby may be broken - how'd you install it?

cheers.

-a

···

On Tue, 6 Sep 2005, William James wrote:

This solution fails under windoze by always returning 0, apparently because
File.stat(path).ino always returns 0.

If you're stuck with windoze, use the previously posted

def Dir.size(name)
Dir[File.join(name, "**/*")].select{|f|
File.file?(f)}.inject(0){|sum,f|
   sum + File.size(f)
}
end

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

indeed.

we've got a script here (dirsum) that does essentially all the things outlined
- especially checking compressed files - for monitoring data volumes. the
problem is much trickier that one would assume.

cheers.

-a

···

On Tue, 6 Sep 2005, Daniel Berger wrote:

Ara.T.Howard wrote:

<snip>

just thought i'd point out that every solution posted thus far fails in a
variety of ways when links are considered - in the best cases linked files
are counted twice or linked dirs are not checked, in the worst case
infinite loops occur.

For a good summary of the problems with calculating the size of a directory
(on Windows) see
http://blogs.msdn.com/oldnewthing/archive/2004/12/28/336219.aspx\.

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

Simon Kröger wrote:

David Brady wrote:

If you have links, you also have du, which IMO is the Right Tool For This Job:

total = `du --max-depth=0|cut -f 1`.to_i

sorry for stating the obvious, but: this isn't very portable.

Yup! :slight_smile:

<soapbox>
Portability isn't a good idea here.

(wait for scandalized gasps to quiet down)

The need here was to clean up the code. The solutions so far have bulkily danced around the fact that Find::find doesn't seem to work satisfactorily.

If the Standard Library is defective, the proper, *portable* solution should be to patch the .c files that are defective. Until then, if we're going to work around the Standard Library, we should be as quick and deadly as possible.

I recognize that it may be seen as cheating or possibly even inappropriate to suggest "don't use Ruby for this" on a Ruby mailing list, but I feel strongly that Ruby's very ability to take advantage of external tools is a great strength of the language. A superior wheel already exists; reinventing it poorly does not seem to me to leverage Ruby's power very well.

There seems to me to be a "right" way to do this, and it is to have the Standard Library work as desired. Until then, how much effort should be spent standardizing on a portable kludge?
</soapbox>

And of course, if the OP is on a different platform, the if statement that started my earlier post will become important. The else clause probably reads "then the issues with File::find link don't affect you. Use the Standard Library."

Just my $0.02. Note sig.

-dB

···

--
David Brady
ruby-talk@shinybit.com
C++ Guru. Ruby nuby. Apply salt as needed.

Ara.T.Howard wrote:

> This solution fails under windoze by always returning 0, apparently because
> File.stat(path).ino always returns 0.
>
> If you're stuck with windoze, use the previously posted
>
> def Dir.size(name)
> Dir[File.join(name, "**/*")].select{|f|
> File.file?(f)}.inject(0){|sum,f|
> sum + File.size(f)
> }
> end

works for me :

   Ara@JEN ~
   $ cat a.rb
     require 'alib'

     def dir_size dir
       size = 0
       totalled = {}
       ALib::Util::find2(dir, 'follow' => true) do |path, stat|
         begin
           next if totalled[stat.ino]
           next unless stat.file?
           size += stat.size
         ensure
           totalled[stat.ino] = true
         end
       end
       size
     end

     p dir_size('.')

   Ara@JEN ~
   $ ruby a.rb
   29432845

   Ara@JEN ~
   $ du -sb .
   29432903 .

   Ara@JEN ~
   $ ruby -r rbconfig -r yaml -e'y Config::CONFIG' |egrep -i win
   target: i686-pc-cygwin
   ac_ct_WINDRES: windres
   WINDRES: windres
   archdir: /usr/lib/ruby/1.8/i386-cygwin
   sitearch: i386-cygwin
   arch: i386-cygwin
   host_os: cygwin
   build: i686-pc-cygwin
   host: i686-pc-cygwin
   build_os: cygwin
   target_os: cygwin
   sitearchdir: /usr/lib/ruby/site_ruby/1.8/i386-cygwin

so it seems like your ruby may be broken - how'd you install it?

Mine:

build: i686-pc-mswin32
build_os: mswin32
host: i686-pc-mswin32
host_os: mswin32
target: i386-pc-mswin32
target_os: mswin32

Yours:

build: i686-pc-cygwin
build_os: cygwin
host: i686-pc-cygwin
host_os: cygwin
target: i686-pc-cygwin
target_os: cygwin

Try it without cygwin. On my system, stat.ino is always 0.

If anyone else is running plain windoze without cygwin, see if
File.stat(path).ino is always 0.

···

On Tue, 6 Sep 2005, William James wrote:

Ara.T.Howard wrote:

···

On Tue, 6 Sep 2005, Daniel Berger wrote:

> Ara.T.Howard wrote:
>
> <snip>
>
>> just thought i'd point out that every solution posted thus far fails in a
>> variety of ways when links are considered - in the best cases linked files
>> are counted twice or linked dirs are not checked, in the worst case
>> infinite loops occur.
>
> For a good summary of the problems with calculating the size of a directory
> (on Windows) see
> http://blogs.msdn.com/oldnewthing/archive/2004/12/28/336219.aspx\.

indeed.

we've got a script here (dirsum) that does essentially all the things outlined
- especially checking compressed files - for monitoring data volumes. the
problem is much trickier that one would assume.

By all means, please share. I'd be happy to add a Dir.size method to
win32-dir. :slight_smile:

Regards,

Dan