Comparing directory contents

Hi all,

I work in the SCM dept of a Windows software shop. A typical software build
involves us getting the code from an engineer, compiling the binaries, gathering
any support files, and then wrapping it in an installer (Installshield). We run
the installer to make sure everything looks ok. As quick-and-dirty sanity check
to make sure we got everything, we go into the install folder, do a 'dir /s',
and pipe the output to a text file. If the file list in the current build
matches the file list of the previous, we give it the ok. These lists are saved
on disk, printed and filed with the build paperwork so we can refer to them
again if necessary.

This method works surprisingly well for catching files that were mistakenly
excluded, but as you can imagine it gets very tedious and error-prone since we
have to hand-check the output. Additionally, many times we are asked by the
engineer to include additional support files, or remove existing ones. I'm
thinking there must be a better way, or better yet, a Ruby Way :slight_smile:
I am relatively new to the language, so I don't really know which angle to
attack it from. The basic gist would be to read in the previous file list
output, strip any junk (extra spaces, line breaks, etc), and do the same for the
current, so what's left is two lists of just pure filenames (don't care about
timestamps or attributes right now). The script would process the lists and the
result would be something like "Indentical" or "Extra files: [filenames]" or
"Removed files: [filenames]".

I'm wondering if something like this already exists. A search of rubyforge and
RAA, however, did not turn up anything this specific, although I really wasn't
sure what I should be looking for. If I could be pointed to a base library that
would get me going, that would be great. Any insights on implementation would
also be greatly apprecited. Thanks!

Does this help?

bschroed@black:~/svn/projekte/ruby-things$ ls -1 > before.list
bschroed@black:~/svn/projekte/ruby-things$ touch another.one
bschroed@black:~/svn/projekte/ruby-things$ ls -1 > after.list
bschroed@black:~/svn/projekte/ruby-things$ irb
irb(main):001:0> before = File.read('before.list').to_a
=> ["before.list\n", ...]
irb(main):002:0> after = File.read('after.list').to_a
=> ["before.list\n", "after.list\n", "another.one\n", ...]
irb(main):003:0> before - after
=>
irb(main):004:0> after - before
=> ["after.list\n", "another.one\n"]

regards,

Brian

···

On 03/08/05, dave davidson <datapanix@gmail.com> wrote:

Hi all,

I work in the SCM dept of a Windows software shop. A typical software build
involves us getting the code from an engineer, compiling the binaries, gathering
any support files, and then wrapping it in an installer (Installshield). We run
the installer to make sure everything looks ok. As quick-and-dirty sanity check
to make sure we got everything, we go into the install folder, do a 'dir /s',
and pipe the output to a text file. If the file list in the current build
matches the file list of the previous, we give it the ok. These lists are saved
on disk, printed and filed with the build paperwork so we can refer to them
again if necessary.

This method works surprisingly well for catching files that were mistakenly
excluded, but as you can imagine it gets very tedious and error-prone since we
have to hand-check the output. Additionally, many times we are asked by the
engineer to include additional support files, or remove existing ones. I'm
thinking there must be a better way, or better yet, a Ruby Way :slight_smile:
I am relatively new to the language, so I don't really know which angle to
attack it from. The basic gist would be to read in the previous file list
output, strip any junk (extra spaces, line breaks, etc), and do the same for the
current, so what's left is two lists of just pure filenames (don't care about
timestamps or attributes right now). The script would process the lists and the
result would be something like "Indentical" or "Extra files: [filenames]" or
"Removed files: [filenames]".

I'm wondering if something like this already exists. A search of rubyforge and
RAA, however, did not turn up anything this specific, although I really wasn't
sure what I should be looking for. If I could be pointed to a base library that
would get me going, that would be great. Any insights on implementation would
also be greatly apprecited. Thanks!

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Though Brian Schröder gave an interesting irb implementation, what you
really need is diff[1]. And don't despair, there is diff for
Windows[2] (via the command line). The GNU developers have put a *lot*
of work and refinement into this heavily used tool -- don't reinvent
the wheel.

[1] http://www.gnu.org/software/diffutils/manual/html_node/index.html
[2] http://gnuwin32.sourceforge.net/packages/diffutils.htm

Jacob Fugal

All,

Thanks so much for the hints and pointers regarding this issue... I've not had a
chance to try all the suggestions (too busy counting files by hand :slight_smile: but I just
wanted to let you know i appreciate the help!

Dave

Hello Jacob,

Though Brian Schröder gave an interesting irb implementation, what you
really need is diff[1]. And don't despair, there is diff for
Windows[2] (via the command line).

The GNU developers have put a *lot*
of work and refinement into this heavily used tool -- don't reinvent
the wheel.

<flame>
And they still got nothing what even comes close to "AraxisMerge" on
Windows, neither from the GUI nor from the quality of the diff algorithm.
</flame>

But back to the question from the original poster, i think diff is a
complete wrong idea as he said he only needs the file names and the content
does not matter for an installer as he puts the complete file into the
setup.exe.

I don't see a very ruby way to solve it as it is a not very
complicated task to process strings. Build two hashs over the file lists
and compare them item by item. Just parsing the previous file list would be litte bit
complicated if the Installshield file format must be parsed and not a
plain string list, but still it should be able to write the script in
100 lines. Or maybe i did not understand dave's real problem.

···

--
Best regards, emailto: scholz at scriptolutions dot com
Lothar Scholz http://www.ruby-ide.com
CTO Scriptolutions Ruby, PHP, Python IDE 's

Jacob Fugal <lukfugl@gmail.com> writes:

Though Brian Schröder gave an interesting irb implementation, what you
really need is diff[1]. And don't despair, there is diff for
Windows[2] (via the command line). The GNU developers have put a *lot*
of work and refinement into this heavily used tool -- don't reinvent
the wheel.

[1] Top (Comparing and Merging Files)
[2] DiffUtils for Windows

Or just use the tool windiff.exe which can be found on your windows
installation coaster.

···

Jacob Fugal

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

> Though Brian Schröder gave an interesting irb implementation, what you
> really need is diff[1]. And don't despair, there is diff for
> Windows[2] (via the command line).

> The GNU developers have put a *lot*
> of work and refinement into this heavily used tool -- don't reinvent
> the wheel.

<flame>
And they still got nothing what even comes close to "AraxisMerge" on
Windows, neither from the GUI nor from the quality of the diff algorithm.
</flame>

Ok, to qualify my statement: Don't reinvent this particular wheel for
a once-off solution. I won't say that someone else can make a better
wheel when that's their primary goal. I don't think Dave Davidson's
goal is to develop a new diff utility. Regarding AraxisMerge, I've
never heard of it. It may be better than GNU DiffUtils. I can't make
any judgement there.

But back to the question from the original poster, i think diff is a
complete wrong idea as he said he only needs the file names and the content
does not matter for an installer as he puts the complete file into the
setup.exe.

diff -qr | grep '^Only'

Know the tool before dismissing it.

Jacob Fugal

···

On 8/3/05, Lothar Scholz <mailinglists@scriptolutions.com> wrote:

Rather,

diff -qr DIR1 DIR2 | grep '^Only'

Sorry for the shabby proofreading...

Jacob Fugal

···

On 8/4/05, Jacob Fugal <lukfugl@gmail.com> wrote:

diff -qr | grep '^Only'

the way i read the OP's post the original contents should be stored and
alterable. the diff approach would require both directories to exist and be
stored and i think the OP wanted to store only the __inventory__ of the dir -
not the actual dir. so not only would the storage/database requirements
skyrocket, but you'd be using a sledgehammer to pound in a mini-tack. this
problem is quite easily solved in only a few lines of ruby - including
database code, command line parsing, etc:

here's the code:

     harp:~ > cat ./dirlist

     #! /usr/bin/env ruby
     require 'pstore'
     require 'yaml'
     require 'getoptlong'

     class DirDb < ::PStore
       def dir
         transaction{ super(exp(dir)) rescue nil}
       end
       def = dir, filelist
         transaction{ super(exp(dir), filelist) }
       end
       def exp dir
         File::expand_path dir
       end
     end

     class FileList < ::Array
       def initialize dir
         @dir = File::expand_path dir
         @glob = File::join @dir, '**', '*'
         replace Dir[@glob].map{|f| File::expand_path f}
       end
       def basenames
         map{|f| f.gsub(%r|^#{ Regexp::escape @dir }/*|,'')}
       end
       def add filename
         self << File::expand_path(File::join(@dir, filename))
       end
       def delete filename
         super(File::expand_path(File::join(@dir, filename)))
       end
       def to_yaml
         to_a.to_yaml
       end
     end

     class Main
       def self::main(*a, &b)
         new(*a, &b).run
       end
       def initialize
         gl = GetoptLong::new ['--db', '-d', GetoptLong::REQUIRED_ARGUMENT]
         gl.each do |opt, arg|
           case opt
             when /db/
               @db_path = arg
           end
         end
         @db_path ||= File::expand_path(File::join('~', '.dirdb'))
         @mode, @mode_args = ARGV.shift, ARGV
         @mode ||= 'help'
         @db = DirDb::new @db_path
       end
       def run
         send(@mode, *@mode_args)
       end
       def scan dir
         @db[dir] = FileList::new dir
         show dir
       end
       def show dir
         y @db[dir]
       end
       def report old_dir, new_dir
         previous = @db[old_dir]
         current = FileList::new new_dir
         report = {}
         report['identical'] = previous.basenames & current.basenames
         report['extra'] = current.basenames - previous.basenames
         report['removed'] = previous.basenames - current.basenames
         y report
       end
       def add dir, filename
         filelist = @db[dir]
         filelist.add filename
         @db[dir] = filelist
       end
       def delete dir, filename
         filelist = @db[dir]
         filelist.delete filename
         @db[dir] = filelist
       end
       def help
         puts "#{ $0 } scan dir | show dir | report new_dir old_dir | add dir filename | delete dir filename"
       end
     end

     $0 == __FILE__ and Main::main

and here's how you use it:

     harp:~ > mkdir version-1.0.0 && touch version-1.0.0/a version-1.0.0/b version-1.0.0/c

     harp:~ > ./dirlist
     ./dirlist scan dir | show dir | report new_dir old_dir | add dir filename | delete dir filename

     harp:~ > ./dirlist scan version-1.0.0/

···

On Fri, 5 Aug 2005, Jacob Fugal wrote:

On 8/3/05, Lothar Scholz <mailinglists@scriptolutions.com> wrote:

> Though Brian Schr=F6der gave an interesting irb implementation, what =

you

> really need is diff[1]. And don't despair, there is diff for
> Windows[2] (via the command line).
=20
> The GNU developers have put a *lot*
> of work and refinement into this heavily used tool -- don't reinvent
> the wheel.
=20
<flame>
And they still got nothing what even comes close to "AraxisMerge" on
Windows, neither from the GUI nor from the quality of the diff algorithm.
</flame>

Ok, to qualify my statement: Don't reinvent this particular wheel for
a once-off solution. I won't say that someone else can make a better
wheel when that's their primary goal. I don't think Dave Davidson's
goal is to develop a new diff utility. Regarding AraxisMerge, I've
never heard of it. It may be better than GNU DiffUtils. I can't make
any judgement there.

But back to the question from the original poster, i think diff is a
complete wrong idea as he said he only needs the file names and the conte=

nt

does not matter for an installer as he puts the complete file into the
setup.exe.

diff -qr | grep '^Only'

Know the tool before dismissing it.

     ---
     - /home/ahoward/version-1.0.0/a
     - /home/ahoward/version-1.0.0/b
     - /home/ahoward/version-1.0.0/c

     harp:~ > rm -rf version-1.0.0/

     harp:~ > mkdir version-2.0.0 && touch version-2.0.0/a version-2.0.0/b

     harp:~ > ./dirlist report version-1.0.0 version-2.0.0
     ---
     removed:
       - c
     extra:
     identical:
       - a
       - b

     harp:~ > touch version-2.0.0/c

     harp:~ > ./dirlist report version-1.0.0 version-2.0.0
     ---
     removed:
     extra:
     identical:
       - a
       - b
       - c

     harp:~ > touch version-2.0.0/d

     harp:~ > ./dirlist report version-1.0.0 version-2.0.0
     ---
     removed:
     extra:
       - d
     identical:
       - a
       - b
       - c

     harp:~ > ./dirlist add version-1.0.0 d

     harp:~ > ./dirlist report version-1.0.0 version-2.0.0
     ---
     removed:
     extra:
     identical:
       - a
       - b
       - c
       - d

     harp:~ > rm version-2.0.0/a

     harp:~ > ./dirlist report version-1.0.0 version-2.0.0
     ---
     removed:
       - a
     extra:
     identical:
       - b
       - c
       - d

     harp:~ > ./dirlist delete version-1.0.0 a

     harp:~ > ./dirlist report version-1.0.0 version-2.0.0
     ---
     removed:
     extra:
     identical:
       - b
       - c
       - d

in any case, i'm all for using built-in tools to accomplish tasks - but this
task is so basic it seem silly not to just write it in pure ruby...

kind regards.

-a
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

> diff -qr | grep '^Only'

<snip>

the way i read the OP's post the original contents should be stored and
alterable. the diff approach would require both directories to exist and be
stored and i think the OP wanted to store only the __inventory__ of the dir -
not the actual dir. so not only would the storage/database requirements
skyrocket, but you'd be using a sledgehammer to pound in a mini-tack.

Ok, I forgot about that constraint. I still think diff would be the
exact tool I would use when on a *nix system:

# Done once to build the list compared against
$ find master_dir/ > master.list

# Done each time to verify all files are there in the working copy
$ find working_dir/ | diff master.list -

I'll admit that once you start getting into pipes and such this
solution probably won't work, or at least not as easily, on Windows.

Jacob Fugal

···

On 8/4/05, Ara.T.Howard@noaa.gov <Ara.T.Howard@noaa.gov> wrote: