Getting a directory tree

Hi, all…

I have a little method here that gets a list of
all directories under a certain starting dir.

I considered using “find” but it didn’t jump
out as being any faster/better than doing it
myself. (E.g., extra method call of “prune”)

This first method works:

···

Return an array of directory names starting at the specified root

SEP = File::Separator

def getdirs(root)
dirs=Dir.entries(root) - [".", “…”]
dirs.collect! {|x| test(?d,x) ? root+SEP+x : nil }
dirs.compact!
list = [root] + dirs
dirs.each {|x| list += getdirs(x) }
list
end

but it’s slow on my platform/version.

I figure if I speed it up on Windows, it will
remain fast on other platforms. (Not necessarily,
of course.)

Here’s one new version. It DOES NOT work, and I’m
not sure why yet.

def getdirs1(root)
dirs=Dir.entries(root) - [".", “…”]
temp = []
dirs.each {|x| if test(?d,x) then temp << (root+SEP+x) end }
list = [root] + temp
dirs.each {|x| list += getdirs(x) }
list
end

So my two questions are:

  1. What’s wrong with getdirs1 that I’m not seeing?
  2. How would you write this in a reasonably
    efficient way?

Cheers,
Hal

Hi –

Here’s one new version. It DOES NOT work, and I’m
not sure why yet.

def getdirs1(root)
dirs=Dir.entries(root) - [“.”, “…”]
temp =
dirs.each {|x| if test(?d,x) then temp << (root+SEP+x) end }
list = [root] + temp
dirs.each {|x| list += getdirs(x) }
list
end

So my two questions are:

  1. What’s wrong with getdirs1 that I’m not seeing?
  2. How would you write this in a reasonably
    efficient way?

Sticking to question #1 for the moment… :slight_smile:

You’ve got a call to getdirs instead of a recursive call to getdirs1.
Also, you’re testing x, but you need to test root+SEP+x. (Otherwise
the test will be in the current directory.) And… you should do
temp.each, not dirs.each, I think.

Here’s a nice slow revised version, which looks better than it
performs :slight_smile:

def getdirs_d1(root)
dirs=Dir.entries(root) - [“.”, “…”]
temp = dirs.map {|d| root+SEP+d}.select {|d| test(?d,d)}
temp + temp.map {|x| getdirs1(x) }
end

Maybe something could be done with it or an amalgam of them.

David

···

On Mon, 23 Sep 2002, Hal E. Fulton wrote:


David Alan Black | Register for RubyConf 2002!
home: dblack@candle.superlink.net | November 1-3
work: blackdav@shu.edu | Seattle, WA, USA
Web: http://pirate.shu.edu/~blackdav | http://www.rubyconf.com

Hello Hal,

Monday, September 23, 2002, 12:59:46 AM, you wrote:

def getdirs1(root)
dirs=Dir.entries(root) - [“.”, “…”]
temp =
dirs.each {|x| if test(?d,x) then temp << (root+SEP+x) end }
list = [root] + temp
dirs.each {|x| list += getdirs(x) }
list
end

So my two questions are:

  1. What’s wrong with getdirs1 that I’m not seeing?
  2. How would you write this in a reasonably
    efficient way?

of course, use closure to emulate function-in-function behavior:

def getdirs(x)
list = Array.new
recursive_getdir = proc {|root|
Dir.foreach(root) {|dir|
unless dir==‘.’ || dir==‘…’
filename = root+‘/’+dir
if File.stat(filename).directory?
recursive_getdir.call(filename)
else
list << filename
end
end
}
}
recursive_getdir.call(x)
list
end

only way to speed up this (on my 1.6.6/cygwin and 1.7.2/visual-c++
platforms) is to add to Dir.entries capability of retrieving filestat
information. File.stat() call consumes 80% (!!!) of this script
execution time. without this call, this function will work 1.5 times
faster than my dir/s!

and, matz, how about gc() method on all objects to release excess of
memory for such objects as arrays?

···


Best regards,
Bulat mailto:bulatz@integ.ru

You’ve got a call to getdirs instead of a recursive call to getdirs1.

Dumb.

Also, you’re testing x, but you need to test root+SEP+x. (Otherwise
the test will be in the current directory.)

Dumber.

So that’s why I had that “unnecessary” chdir and getwd
in the other version… :slight_smile:

And… you should do
temp.each, not dirs.each, I think.

Dumbest.

Why do I post on days like this?

Here’s a nice slow revised version, which looks better than it
performs :slight_smile:

def getdirs_d1(root)
dirs=Dir.entries(root) - [“.”, “…”]
temp = dirs.map {|d| root+SEP+d}.select {|d| test(?d,d)}
temp + temp.map {|x| getdirs1(x) }
end

I’ll probably go with this because it looks nice.

In connection with my being brain-dead today, I did
discover a way to double the speed of this search.

I call this new technique “Don’t Invoke the Method Twice
When You Don’t Really Need To.” :slight_smile:

Hal

···

----- Original Message -----
From: dblack@candle.superlink.net
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Sunday, September 22, 2002 4:34 PM
Subject: Re: Getting a directory tree

Here’s one new version. It DOES NOT work, and I’m
not sure why yet.

def getdirs1(root)
dirs=Dir.entries(root) - [“.”, “…”]
temp =
dirs.each {|x| if test(?d,x) then temp << (root+SEP+x) end }
list = [root] + temp
dirs.each {|x| list += getdirs(x) }
list
end

So my two questions are:

  1. What’s wrong with getdirs1 that I’m not seeing?

The same problem David has below. The routine doesn’t handle recursive
directories well-- it follows the link back up the tree. You know that odd
case where some malcontent has done:

ln -s …/… link_up

In fact, I was surprised my test with David’s routine below ever returned,
but believe me… it was the ugliest nested list you ever saw.

  1. How would you write this in a reasonably
    efficient way?

Add a check for symlinks.

def getdirs_d1(root)
return if File.lstat(root).symlink?

dirs=Dir.entries(root) - [".", ".."]
temp = dirs.map {|d| root+SEP+d}.select {|d| test(?d,d)}
temp + temp.map {|x| getdirs1(x) }

end

-michael

···

On Sunday 22 September 2002 16:34, dblack@candle.superlink.net wrote:

On Mon, 23 Sep 2002, Hal E. Fulton wrote:

++++++++++++++++++++++++++++++++++++++++++
Michael C. Libby x@ichimunki.com
public key: http://www.ichimunki.com/public_key.txt
web site: http://www.ichimunki.com
++++++++++++++++++++++++++++++++++++++++++

The same problem David has below. The routine doesn’t handle recursive
directories well-- it follows the link back up the tree. You know that odd
case where some malcontent has done:

ln -s …/… link_up

In fact, I was surprised my test with David’s routine below ever returned,
but believe me… it was the ugliest nested list you ever saw.

I fixed David’s problem just by doing a flatten…
the reason it returned a nested list was that he
was doing
temp + temp.map{…}
to which I added a flatten.

The case below I think is unrelated. No, I’m not handling
symlinks (though I should).

Haven’t tested this app on UNIX lately. And no symlinks
on Windows. :slight_smile:

  1. How would you write this in a reasonably
    efficient way?

Add a check for symlinks.

def getdirs_d1(root)
return if File.lstat(root).symlink?

dirs=Dir.entries(root) - [".", ".."]
temp = dirs.map {|d| root+SEP+d}.select {|d| test(?d,d)}
temp + temp.map {|x| getdirs1(x) }

end

Thanks,
Hal

···

----- Original Message -----
From: “michael libby” x@ichimunki.com
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Sunday, September 22, 2002 10:37 PM
Subject: Re: Getting a directory tree

Hal,

The following is a performance test (using profile.rb and Linux’ ‘time’) of…

My implementation:

–snip–
require 'profile’
def getdirs root
Dir["#{root}/**/*"].select{ |file| test ?d, file and not test ?l, file }
end
getdirs ‘/home/iusris/devel’
–snip–

which uses the recursive ** Dir glob (of Zsh fame).

And the modified implementation suggested earlier (with a symlink-killer and a
single flatten):

–snip–
require 'profile’
SEP = ‘/‘
def getdirs(root)
return [] if test ?l, root
dirs = Dir.entries(root) - [".", “…”]
temp = dirs.map {|d| root+SEP+d}.select {|d| test(?d,d)}
temp + temp.map {|x| getdirs(x) }
end
getdirs(’/home/iusris/devel’).flatten
–snip–

with the following results:

[MINE]:

iusris@codedbliss[~/ruby/sandbox] time ruby getdirs.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
40.84 1.07 1.07 1 1070.00 1500.00 Array#select
30.53 1.87 0.80 1 800.00 1110.00 Dir#[]
16.41 2.30 0.43 1230 0.35 0.35 Kernel.test
11.83 2.61 0.31 1089 0.28 0.28 String#allocate
0.00 2.61 0.00 1 0.00 0.00 Module#method_added
0.00 2.61 0.00 1 0.00 2610.00 Object#getdirs
0.00 2.61 0.00 1 0.00 2620.00 #toplevel
0.00 2.61 0.00 2 0.00 0.00 Array#allocate

ruby getdirs.rb 2.82s user 0.53s system 648% cpu 0.516 total

[PREVIOUSLY SUGGESTED]:

iusris@codedbliss[~/ruby/sandbox] time ruby getdirs2.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
33.71 9.53 9.53 147 64.83 104.83 Array#-
20.55 15.34 5.81 13834 0.42 0.42 String#==
9.44 18.01 2.67 294 9.08 388.50 Array#map
9.20 20.61 2.60 2230 1.17 1.64 String#+
6.69 22.50 1.89 3935 0.48 0.48 String#allocate
4.60 23.80 1.30 147 8.84 922.52 Object#getdirs
4.03 24.94 1.14 147 7.76 12.79 Dir#each
3.93 26.05 1.11 147 7.55 12.11 Array#select
2.51 26.76 0.71 1262 0.56 0.56 Kernel.test
2.19 27.38 0.62 1031 0.60 0.60 Array#allocate
0.88 27.63 0.25 147 1.70 16.19 Dir#entries
0.64 27.81 0.18 411 0.44 0.44 Fixnum#==
0.60 27.98 0.17 1 170.00 350.00 Array#flatten
0.50 28.12 0.14 147 0.95 1.36 Array#+
0.42 28.24 0.12 147 0.82 14.29 Enumerable.to_a
0.11 28.27 0.03 147 0.20 0.20 Dir#open
0.00 28.27 0.00 1 0.00 28270.00 #toplevel
0.00 28.27 0.00 1 0.00 0.00 Module#method_added

ruby getdirs2.rb 28.51s user 0.39s system 373% cpu 7.734 total

As you can see, the previously suggested version takes about 10x longer to
complete.

···

I call this technique “Glob-orific and Short-tacular”

~ Bruce


_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/
Bruce Williams http://www.codedbliss.com
iusris/#ruby-lang bruce@codedbliss.com
_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/

Hello Bruce,

Monday, September 23, 2002, 9:30:07 AM, you wrote:

Hal,

The following is a performance test (using profile.rb and Linux’ ‘time’) of…

My implementation:

–snip–
require ‘profile’
def getdirs root
Dir[“#{root}/**/*”].select{ |file| test ?d, file and not test ?l, file }
end
getdirs ‘/home/iusris/devel’
–snip–

which uses the recursive ** Dir glob (of Zsh fame).

but i’m (see 9:44 letter) faster :slight_smile: 60 seconds vs 100 on my win2k
box. you use 40 more seconds for “test ?d” which one more time tested
inside Dir

···


Best regards,
Bulat mailto:bulatz@integ.ru

Hi –

···

On Mon, 23 Sep 2002, Bruce Williams wrote:

Hal,

The following is a performance test (using profile.rb and Linux’ ‘time’) of…

My implementation:

–snip–
require ‘profile’
def getdirs root
Dir[“#{root}/**/*”].select{ |file| test ?d, file and not test ?l, file }
end
getdirs ‘/home/iusris/devel’
–snip–

which uses the recursive ** Dir glob (of Zsh fame).

The problem with that is it doesn’t include anything whose name begins
with a dot (at least on *nix). So the results are very different from
what you get with Dir.entries.

David


David Alan Black | Register for RubyConf 2002!
home: dblack@candle.superlink.net | November 1-3
work: blackdav@shu.edu | Seattle, WA, USA
Web: http://pirate.shu.edu/~blackdav | http://www.rubyconf.com

Hi,

···

At Mon, 23 Sep 2002 19:35:08 +0900, dblack@candle.superlink.net wrote:

–snip–
require ‘profile’
def getdirs root
Dir[“#{root}/**/*”].select{ |file| test ?d, file and not test ?l, file }
end
getdirs ‘/home/iusris/devel’
–snip–

which uses the recursive ** Dir glob (of Zsh fame).

The problem with that is it doesn’t include anything whose name begins
with a dot (at least on *nix). So the results are very different from
what you get with Dir.entries.

Dir[“#{root}/**/{.[^.],…?,}*/”].select {|file| not test ?l, file}


Nobu Nakada