Searching Directories

I'm trying to figure out the best way to accomplish this task.

Being new to Ruby, I'm trying to be efficient (if not elegant),
but first just want something that works.

I'm using Ruby 1.8.2 rc4 on Windows 2000.

Here's the task.

···

--------------
I have text files of lastnames, one name per line.

I want to read each name from each file, and determine
if that name is part of any file which has the extensions
*.txt or *.doc or *.rtf which occur in any directory within
the tree structure for a given top level directory.

The human interface is something simple like this:

The text filename is keyboard entered into "NameListFile".
The top-level-dir is keyboard entered into "DirToSearch".
The output-file is keyboard entered into "SearchResults".

Doing something like: DirItems = Dir.entries(DirToSearch)
I can get an array of items in the top-level directory.

Given this array of files and directories, I just need to
check to see if an item is a "file" or "directory".

If its a "file" I check to see that its extension is
*.txt or *.doc or *.rtf. If the file has any of those
extensions, I then check to see whether the filename
contains, in any part, a name from "NameListFile".

If YES, I print the name in an output file in the formant:
Name, filedate, filename(fullpath relative to "DirToSearch")

If an item in DirItems is a "directory" then I do the checking
process for the entries in that directory, etc, etc, untill
every file in every directory is searched.

It seems this is a natural case to use recursion to check all
the directories in the tree, but I'm not sure which methods
to use to do this.

This seems like a pretty easy/standard thing to do, but I just
don't know enough ruby (yet) to do it easy.

Your help and guidance is appreciated in advance.

Jabari

Pseudo-code is your friend. What you say you want to do is something
like:

define find_stuff_in a_directory
    for each thing in a_directory
        if it is a directory
            find stuff in it (unless it's . & ..)
          else (when it's a file)
            print information about it if its interesting

All of this is from you e-mail, and could apply to any language.
Now for the ruby part. Look up the class Dir, which will give you the
contents of a directory as an array, Array (and Enumerable) which let
you walk through instances, and Regexp which let you test strings to see
if they match patterns.

Note that the recursion sort of takes care of itself in the forth line
of the pseudo code.

Have fun.

-- MarkusQ

···

On Mon, 2004-08-30 at 11:40, Jabari Zakiya wrote:

I'm trying to figure out the best way to accomplish this task.

Being new to Ruby, I'm trying to be efficient (if not elegant),
but first just want something that works.

I'm using Ruby 1.8.2 rc4 on Windows 2000.

Here's the task.

--------------
I have text files of lastnames, one name per line.

I want to read each name from each file, and determine
if that name is part of any file which has the extensions
*.txt or *.doc or *.rtf which occur in any directory within
the tree structure for a given top level directory.

The human interface is something simple like this:

The text filename is keyboard entered into "NameListFile".
The top-level-dir is keyboard entered into "DirToSearch".
The output-file is keyboard entered into "SearchResults".

Doing something like: DirItems = Dir.entries(DirToSearch)
I can get an array of items in the top-level directory.

Given this array of files and directories, I just need to
check to see if an item is a "file" or "directory".

If its a "file" I check to see that its extension is
*.txt or *.doc or *.rtf. If the file has any of those
extensions, I then check to see whether the filename
contains, in any part, a name from "NameListFile".

If YES, I print the name in an output file in the formant:
Name, filedate, filename(fullpath relative to "DirToSearch")

If an item in DirItems is a "directory" then I do the checking
process for the entries in that directory, etc, etc, untill
every file in every directory is searched.

It seems this is a natural case to use recursion to check all
the directories in the tree, but I'm not sure which methods
to use to do this.

This seems like a pretty easy/standard thing to do, but I just
don't know enough ruby (yet) to do it easy.

Your help and guidance is appreciated in advance.

Jabari

Jabari Zakiya wrote:

I'm trying to figure out the best way to accomplish this task.

Being new to Ruby, I'm trying to be efficient (if not elegant),
but first just want something that works.

I'm using Ruby 1.8.2 rc4 on Windows 2000.

Here's the task.

--------------
I have text files of lastnames, one name per line.

I want to read each name from each file, and determine
if that name is part of any file which has the extensions
*.txt or *.doc or *.rtf which occur in any directory within
the tree structure for a given top level directory.

The human interface is something simple like this:

The text filename is keyboard entered into "NameListFile".
The top-level-dir is keyboard entered into "DirToSearch".
The output-file is keyboard entered into "SearchResults".

Doing something like: DirItems = Dir.entries(DirToSearch)
I can get an array of items in the top-level directory.

Given this array of files and directories, I just need to
check to see if an item is a "file" or "directory".

If its a "file" I check to see that its extension is
*.txt or *.doc or *.rtf. If the file has any of those
extensions, I then check to see whether the filename
contains, in any part, a name from "NameListFile".

If YES, I print the name in an output file in the formant:
Name, filedate, filename(fullpath relative to "DirToSearch")

If an item in DirItems is a "directory" then I do the checking
process for the entries in that directory, etc, etc, untill
every file in every directory is searched.

It seems this is a natural case to use recursion to check all
the directories in the tree, but I'm not sure which methods
to use to do this.

This seems like a pretty easy/standard thing to do, but I just
don't know enough ruby (yet) to do it easy.

Your help and guidance is appreciated in advance.

Jabari

require 'find'
dirs=Array.new
files=Array.new
Find.find('/directory/path'){|entry|
  if FileTest.directory?(entry)
    dirs.push(entry)
  else
    files.push(entry)
  end
}

#by now dirs and files are arrays of diretories and files, find is #recursive so You can use it instead of Dir
require 'pp'
pp dirs
#=>["E:/usr\\local/rdoc/sqlite-1.3.0",
  "E:/usr\\local/rdoc/sqlite-1.3.0/files",
  "E:/usr\\local/rdoc/sqlite-1.3.0/classes",
  "E:/usr\\local/rdoc/sqlite-1.3.0/classes/SQLite"]
#process files...
files.each{|f|
   case File.extname(f)
     when '.html' then puts f + ' : is a HTML'
     when 'rid' then puts f + ' : is mmmh something...'
   end
}
#results in
#=>
E:/usr\local/rdoc/sqlite-1.3.0/index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_method_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_file_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_class_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_rb.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_c.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/created.rid : is mmmh something...

hope this may help you...
Adartse

Osuka Adartse <rocioestradacastaneda@prodigy.net.mx> wrote in message news:<4133BC0C.3020902@prodigy.net.mx>...

Jabari Zakiya wrote:

> I'm trying to figure out the best way to accomplish this task.
>
> Being new to Ruby, I'm trying to be efficient (if not elegant),
> but first just want something that works.
>
> I'm using Ruby 1.8.2 rc4 on Windows 2000.
>
> Here's the task.
>
> --------------
> I have text files of lastnames, one name per line.
>
> I want to read each name from each file, and determine
> if that name is part of any file which has the extensions
> *.txt or *.doc or *.rtf which occur in any directory within
> the tree structure for a given top level directory.
>
> The human interface is something simple like this:
>
> The text filename is keyboard entered into "NameListFile".
> The top-level-dir is keyboard entered into "DirToSearch".
> The output-file is keyboard entered into "SearchResults".
>
> Doing something like: DirItems = Dir.entries(DirToSearch)
> I can get an array of items in the top-level directory.
>
> Given this array of files and directories, I just need to
> check to see if an item is a "file" or "directory".
>
> If its a "file" I check to see that its extension is
> *.txt or *.doc or *.rtf. If the file has any of those
> extensions, I then check to see whether the filename
> contains, in any part, a name from "NameListFile".
>
> If YES, I print the name in an output file in the formant:
> Name, filedate, filename(fullpath relative to "DirToSearch")
>
> If an item in DirItems is a "directory" then I do the checking
> process for the entries in that directory, etc, etc, untill
> every file in every directory is searched.
>
> It seems this is a natural case to use recursion to check all
> the directories in the tree, but I'm not sure which methods
> to use to do this.
>
> This seems like a pretty easy/standard thing to do, but I just
> don't know enough ruby (yet) to do it easy.
>
> Your help and guidance is appreciated in advance.
>
> Jabari
>
>
require 'find'
dirs=Array.new
files=Array.new
Find.find('/directory/path'){|entry|
  if FileTest.directory?(entry)
    dirs.push(entry)
  else
    files.push(entry)
  end
}

#by now dirs and files are arrays of diretories and files, find is
#recursive so You can use it instead of Dir
require 'pp'
pp dirs
#=>["E:/usr\\local/rdoc/sqlite-1.3.0",
  "E:/usr\\local/rdoc/sqlite-1.3.0/files",
  "E:/usr\\local/rdoc/sqlite-1.3.0/classes",
  "E:/usr\\local/rdoc/sqlite-1.3.0/classes/SQLite"]
#process files...
files.each{|f|
   case File.extname(f)
     when '.html' then puts f + ' : is a HTML'
     when 'rid' then puts f + ' : is mmmh something...'
   end
}
#results in
#=>
E:/usr\local/rdoc/sqlite-1.3.0/index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_method_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_file_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/fr_class_index.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_rb.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/files/sqlite_c.html : is a HTML
E:/usr\local/rdoc/sqlite-1.3.0/created.rid : is mmmh something...

hope this may help you...
Adartse

Thanks for the suggestions.

Heres my approach.

···

---------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outfile)

out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
  # If entry a Directory search inside it
  if FileTest.directory?(entry)
     next
  # Else entry was a file
  else
     # If current file has a desired extension
     if exts.include? File.extname(entry)
        # Check for each name in namefile
        File.open(namefile).each{|name| name = name.chomp
          # If name is part of basename of file
          if File.basename(entry) =~ /#{name}/
            # Write line to output file and screen if there is a match
            out.puts(name+ ", " + entry + ", " + File.open(entry).mtime.to_s)
            puts name + ", " + entry + ", " + File.open(entry).mtime.to_s
            total += 1
          end
        }
     end
  end
}
print "Total files = ", total , "\n"
out.close
end

Hi,

At Thu, 2 Sep 2004 04:50:22 +0900,
Jabari Zakiya wrote in [ruby-talk:111197]:

Heres my approach.
---------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outfile)

out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
  # If entry a Directory search inside it
  if FileTest.directory?(entry)
     next
  # Else entry was a file
  else

Don't you want to check if it is a file? Many file systems
have other types than file and directory, you should use
File.file?(entry) or:

    stat = File.stat(entry)
    if stat.file?

     # If current file has a desired extension
     if exts.include? File.extname(entry)
        # Check for each name in namefile
        File.open(namefile).each{|name| name = name.chomp
          # If name is part of basename of file
          if File.basename(entry) =~ /#{name}/

Compiling regexp each time would be too expensive.

            # Write line to output file and screen if there is a match
            out.puts(name+ ", " + entry + ", " + File.open(entry).mtime.to_s)
            puts name + ", " + entry + ", " + File.open(entry).mtime.to_s

Leaving opened files is very bad manner. You can use
File.mtime(entry) instead, or with above File.stat:
           
              out.puts(name+ ", " + entry + ", " + stat.mtime.to_s)
              puts name + ", " + entry + ", " + stat.mtime.to_s

···

--
Nobu Nakada

Jabari Zakiya wrote:

Thanks for the suggestions.

Heres my approach.
---------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outfile)

out = File.new(outfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of TopDir
Find.find(topdir){|entry|
  # If entry a Directory search inside it
  if FileTest.directory?(entry)
     next

unless I didn't understood You, there's no real point for this 2 lines... so I changed'em to deleting also the next 4...
if !FileTest.directory?(entry) && exts.include?(File.extname(entry))

  # Else entry was a file
  else
     # If current file has a desired extension
     if exts.include? File.extname(entry)

I prefer to add the ()'s for readability, I wondered for a sec' what the above line meant :wink:

        # Check for each name in namefile
        File.open(namefile).each{|name| name = name.chomp
          # If name is part of basename of file
          if File.basename(entry) =~ /#{name}/
            # Write line to output file and screen if there is a match
            out.puts(name+ ", " + entry + ", " + File.open(entry).mtime.to_s)

btw there's no need to open the file for using several of File.methods or the to_s

            puts name + ", " + entry + ", " + File.open(entry).mtime.to_s

a matter of preference but I avoid: puts variable + ", " + ...etc intead
puts "#{variable}, #{var2}" or better use printf to get more control on output, it's easier on the eyes...at least for me i.e. printf("%-24s, %-48s , %s\n",name,entry,File.mtime(entry)) I used strftime for similar reasons.

            total += 1
          end
        }
     end
  end
}
print "Total files = ", total , "\n"
out.close
end

goodie!! :slight_smile:

my lil' changes FWIW

def dirsearch(namefile, topdir, outfile)
   require 'find'
   out = File.new(outfile, "w")
   #Total number of files found
   total = 0

   # Array with file extensions to check for
   exts = %w{.doc .rtf .txt}

   # Check entries in each subdirectory of TopDir
   Find.find(topdir){|entry|
     # we're looking for files not dirs and files thta match exts so...
     if !FileTest.directory?(entry) && exts.include?(File.extname(entry))
        # Check for each name in namefile
        File.open(namefile).each{|name| name = name.chomp
          # If name is part of basename of file
          if File.basename(entry) =~ /#{name}/
            # Write line to output file and screen if there is a match
            out.printf("%-16s: %-48s: %s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
            #either this line with mtime's output as it is or formatted with strftime
            #strftime makes things more my way omitting info I don't need/want...
            #option#printf("%-24s: %-48s: %s\n",name,entry,File.mtime(entry))
            printf("%-16s: %-48s: %s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
            total += 1
          end
        }
     end
   }
   print "Total files = #{total} \n"
   out.close
end

cheers
Adartse

Osuka Adartse <rocioestradacastaneda@prodigy.net.mx> wrote in message news:<4136971B.8090208@prodigy.net.mx>...
[...]

···

my lil' changes FWIW

def dirsearch(namefile, topdir, outfile)
   require 'find'
   out = File.new(outfile, "w")
   #Total number of files found
   total = 0

   # Array with file extensions to check for
   exts = %w{.doc .rtf .txt}

   # Check entries in each subdirectory of TopDir
   Find.find(topdir){|entry|
     # we're looking for files not dirs and files thta match exts so...
     if !FileTest.directory?(entry) && exts.include?(File.extname(entry))
        # Check for each name in namefile
        File.open(namefile).each{|name| name = name.chomp
          # If name is part of basename of file
          if File.basename(entry) =~ /#{name}/
            # Write line to output file and screen if there is a match
            out.printf("%-16s: %-48s:
%s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
            #either this line with mtime's output as it is or formatted
with strftime
            #strftime makes things more my way omitting info I don't
need/want...
            #option#printf("%-24s: %-48s:
%s\n",name,entry,File.mtime(entry))
            printf("%-16s: %-48s:
%s\n",name,entry,File.mtime(entry).strftime("%d-%b-%Y %H:%M"))
            total += 1
          end
        }
     end
   }
   print "Total files = #{total} \n"
   out.close
end

cheers
Adartse

---------------------------------------------------------
Taking everyones suggestions into account here is my new version.

This version should be more amenable to different systems.
I didn't bother to do elaborate output formatting because
the person who needs the output is satisfied with the
current format. I will use the formatting suggestions if
I need to in the future. There also is a default output file.

One thing I DO NEED is help on is EXCEPTION HANDLING!
I encountered CORRUPTED directories on a disk I searched
through, which caused the program to bomb with an error
message. I would like suggestions on including handling
DIR/FILE EXCEPTIONS. Ideally, I would like to be able to
continue searching when EXCEPTIONS are met and record
what the EXCEPTION (corrupted DIR/FILE) was. Is this possible?

Nobu, I didn't understand your comment about the Regexp
searching. Could you explain in more detail the issue, and
an alternative approach.

Also, I could have first put all the names in 'namefile'
in an array, and then iterate over that, for purposes of
speed, but in this case it doesn't matter. I might try it
though, just to see if it speeds things up appreciably.

Again, thanks is advance for your help and suggestions! :wink:

---------------------------------------------------------
require 'find'

def dirsearch(namefile, topdir, outputfile='searchresults.txt')

outfile = File.new(outputfile, "w")
#Total number of files found
total = 0

# Array with file extensions to check for
exts = %w{.doc .rtf .txt}

# Check entries in each subdirectory of topdir
Find.find(topdir){|entry|
  # If entry is a file with a desired extension
  if File.file?(entry) && exts.include?(File.extname(entry))
     # Check for each name in namefile
     File.open(namefile).each{|name| name = name.chomp
       # If name is part of basename of file
       if File.basename(entry) =~ /#{name}/
          # Write line to output file and screen if there is a match
          outfile.puts("#{name}:\n #{entry}, #{File.stat(entry).mtime.to_s}")
          puts "#{name}:\n #{entry}, #{File.stat(entry).mtime.to_s}"
          total += 1
       end
     }
  # Else get next entry
  else
     next
  end
}
print "Total files = ", total , "\n"
outfile.close
end