Windows File Comparing

I am attempting to right a small utility that checks one file against
an automated snapshot of itself in a different location to make sure
that they are the same. I have not had any luck doing this and thought
I would present this to the group to see if you can help me out.

So far when I execute my code everything that is not an actual file
("." and "..") shows as OK but, anything else shows as not equal.
I have tried this using several different things like File1.size ==
File2.size and also what you see in the code listed below. Neither
seem to give me a result showing that these files are equal.

If I were to look in windows explore at these 2 files and look at the
file sizes, they both look identical so I am not sure what it is that
*.size is looking at to show that they aren't the same size.

Can anyone advise me on how I can better implement this? Also how I
can get rid of the "." files in the directory

My code so far is as below:

require 'fileutils'
require 'ftools'

y = Dir.entries("\\\\C:\\BNADataFile\\")
z = Dir.entries("\\\\F\\BNADataFile\\.snapshot\\hourly.0\\")

y.each do |y|
    z.each do |z|
        if(FileUtils.uptodate?(y,z))
            puts y + ": Snapshot Successful"
        else
            puts y + ": Not the same"
        end
    end
end

My results: (Not sure why ".." showed up so many times)

C:\Documents and Settings\lmcilwain\My Documents\scripts\work>ruby
check.rb
.: Not the same
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
..: Snapshot Successful
backup: Not the same
Archived..cdb: Not the same
OPInc..cdb: Not the same
ArchivedAlt.cdb: Not the same
Alts.cdb: Not the same
event.log: Not the same

i suppose a lot of this depends on how heavy-weight you want this to be. something more reliable would be a SHA1 of the contents of the file-

require 'digest/sha1'

def content_hash(path)
   digest = Digest::SHA1.new
   File.open(path, 'rb') do |file|
     while buffer = file.read(1048576) do
       digest << buffer
     end
   end
   return digest.hexdigest
end

it also depends on what exactly you want to be doing with it- this will create a signature for the file contents that you can use to compare against another. if all you care about is checking whether a file has changed or not, and the files are located on an NTFS partition and you really don't like the .folder, you could use alternate data streams (ADS):

def write_signature(path, signature)
   open(path + ":signature_stream", "w") {|f| f.write(signature}
end

this will effectively hide the signature. you could potentially duplicate the contents into a stream, but that would not be an effective use- if the duplication is for versioning or backups, loss of the file will also result in loss of the stream and, thus, the backup. if you are running this as a service, you could just use a simple db to contain the path + signature and use that as your check.

i would recommend using something like SHA1 to check the contents- far more reliable the size, ctime, or mtime. if the files are fairly small, it won't be too costly. regardless, caching the signature of the copy in either an ads or a database will save time- you'll only need to create the signature of the original file and compare it to the cached sig of the copy:

class FileCompObj
   attr_reader :path, :signature
   def initialize(path, signature = nil)
     @path = path
     @signature = signature ? signature : content_hash(path)
   end

   def ==(file)
     @signature == file.signature
   end

   def content_hash(path)
     digest = Digest::SHA1.new
     File.open(path, 'rb') do |file|
       while buffer = file.read(1048576) do
         digest << buffer
       end
     end
     return digest.hexdigest
   end
end

a simple convenience class like that would allow you to do something like this:

# include methods defined above
path, backup_path = ARGV[0], ARGV[1]
original = FileCompObj.new(path)
...
# grab sig from ads or db, then pass it along
backup = FileCompObj.new(backup_path, signature)

if original == backup
   puts "Same"
else
   puts "Different"
end

a rather lengthy reply. while no animals were harmed in the creation of this message, i haven't test the code yet, so no guarantees. scan for bugs first!

tom

···

On Jan 15, 2008, at 2:54 PM, Vell wrote:

I am attempting to right a small utility that checks one file against
an automated snapshot of itself in a different location to make sure
that they are the same. I have not had any luck doing this and thought
I would present this to the group to see if you can help me out.

So far when I execute my code everything that is not an actual file
("." and "..") shows as OK but, anything else shows as not equal.
I have tried this using several different things like File1.size ==
File2.size and also what you see in the code listed below. Neither
seem to give me a result showing that these files are equal.

If I were to look in windows explore at these 2 files and look at the
file sizes, they both look identical so I am not sure what it is that
*.size is looking at to show that they aren't the same size.

Can anyone advise me on how I can better implement this? Also how I
can get rid of the "." files in the directory

My code so far is as below:

require 'fileutils'
require 'ftools'

y = Dir.entries("\\\\C:\\BNADataFile\\")
z = Dir.entries("\\\\F\\BNADataFile\\.snapshot\\hourly.0\\")

y.each do |y|
   z.each do |z|
       if(FileUtils.uptodate?(y,z))
           puts y + ": Snapshot Successful"
       else
           puts y + ": Not the same"
       end
   end
end

My results: (Not sure why ".." showed up so many times)

C:\Documents and Settings\lmcilwain\My Documents\scripts\work>ruby
check.rb
.: Not the same
.: Snapshot Successful
..: Snapshot Successful
backup: Not the same
Archived..cdb: Not the same
OPInc..cdb: Not the same
ArchivedAlt.cdb: Not the same
Alts.cdb: Not the same
event.log: Not the same

My results: (Not sure why ".." showed up so many times)

You're taking files in y and comparing them to every file in z.

Change:

           puts y + ": Not the same"

to

           puts y + ": Not the same as :" + z

and I think you'll see what I mean.

Hope that helps.

Gordon

i suppose a lot of this depends on how heavy-weight you want this to be. something more reliable would be a SHA1 of the contents of the file-

require 'digest/sha1'

def content_hash(path)
digest = Digest::SHA1.new
File.open(path, 'rb') do |file|
   while buffer = file.read(1048576) do
     digest << buffer
   end
end
return digest.hexdigest
end

it also depends on what exactly you want to be doing with it- this will create a signature for the file contents that you can use to compare against another. if all you care about is checking whether a file has changed or not, and the files are located on an NTFS partition and you really don't like the .folder, you could use alternate data streams (ADS):

def write_signature(path, signature)
open(path + ":signature_stream", "w") {|f| f.write(signature}
end

this will effectively hide the signature. you could potentially duplicate the contents into a stream, but that would not be an effective use- if the duplication is for versioning or backups, loss of the file will also result in loss of the stream and, thus, the backup. if you are running this as a service, you could just use a simple db to contain the path + signature and use that as your check.

i would recommend using something like SHA1 to check the contents- far more reliable the size, ctime, or mtime. if the files are fairly small, it won't be too costly. regardless, caching the signature of the copy in either an ads or a database will save time- you'll only need to create the signature of the original file and compare it to the cached sig of the copy:

class FileCompObj
attr_reader :path, :signature
def initialize(path, signature = nil)
   @path = path
   @signature = signature ? signature : content_hash(path)
end

def ==(file)
   @signature == file.signature
end

def content_hash(path)
   digest = Digest::SHA1.new
   File.open(path, 'rb') do |file|
     while buffer = file.read(1048576) do
       digest << buffer
     end
   end
   return digest.hexdigest
end
end

a simple convenience class like that would allow you to do something like this:

# include methods defined above
path, backup_path = ARGV[0], ARGV[1]
original = FileCompObj.new(path)
...
# grab sig from ads or db, then pass it along
backup = FileCompObj.new(backup_path, signature)

if original == backup
puts "Same"
else
puts "Different"
end

a rather lengthy reply. while no animals were harmed in the creation of this message, i haven't test the code yet, so no guarantees. scan for bugs first!

tom

···

----- Original Message -----
From: "Vell" <lovell.mcilwain@gmail.com>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Tuesday, January 15, 2008 2:54:04 PM (GMT-0700) America/Denver
Subject: Windows File Comparing

I am attempting to right a small utility that checks one file against
an automated snapshot of itself in a different location to make sure
that they are the same. I have not had any luck doing this and thought
I would present this to the group to see if you can help me out.

So far when I execute my code everything that is not an actual file
("." and "..") shows as OK but, anything else shows as not equal.
I have tried this using several different things like File1.size ==
File2.size and also what you see in the code listed below. Neither
seem to give me a result showing that these files are equal.

If I were to look in windows explore at these 2 files and look at the
file sizes, they both look identical so I am not sure what it is that
*.size is looking at to show that they aren't the same size.

Can anyone advise me on how I can better implement this? Also how I
can get rid of the "." files in the directory

My code so far is as below:

require 'fileutils'
require 'ftools'

y = Dir.entries("\\\\C:\\BNADataFile\\")
z = Dir.entries("\\\\F\\BNADataFile\\.snapshot\\hourly.0\\")

y.each do |y|
    z.each do |z|
        if(FileUtils.uptodate?(y,z))
            puts y + ": Snapshot Successful"
        else
            puts y + ": Not the same"
        end
    end
end

My results: (Not sure why ".." showed up so many times)

C:\Documents and Settings\lmcilwain\My Documents\scripts\work>ruby
check.rb
.: Not the same
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
.: Snapshot Successful
..: Snapshot Successful
backup: Not the same
Archived..cdb: Not the same
OPInc..cdb: Not the same
ArchivedAlt.cdb: Not the same
Alts.cdb: Not the same
event.log: Not the same

Lovell Mcilwain wrote:

I am attempting to right a small utility that checks one file against
an automated snapshot of itself in a different location to make sure
that they are the same. I have not had any luck doing this and thought
I would present this to the group to see if you can help me out.

You can use File#directory? to test if an entry is a directory (.,.. are
directory)

tiziano

···

--
Posted via http://www.ruby-forum.com/\.

Vell wrote:

Can anyone advise me on how I can better implement this?

Is this just because you want to do it it Ruby?
A quick search should turn up various implementations of diff that will take two directories and recursively compare the contents.

No this did not have to be in Ruby, I only chose to do it in Ruby so
that I can get better at Ruby. Its the only language that I have them
most experience in. I never bothered to search around for any other
things.

···

On Jan 15, 7:39 pm, Reid Thompson <reid.thomp...@ateb.com> wrote:

Vell wrote:

> Can anyone advise me on how I can better implement this?

Is this just because you want to do it it Ruby?
A quick search should turn up various implementations of diff that will take two
directories and recursively compare the contents.

I will give that a shot, I believe I did put an if statement in there
with File#Directory since I don't need to check the directories just
the files but I ended up getting an error that I didn't post. I will
try it again and post my error if it still bombs on me.

···

On Jan 15, 6:20 pm, Tiziano Merzi <giua...@gmail.com> wrote:

Lovell Mcilwain wrote:
> I am attempting to right a small utility that checks one file against
> an automated snapshot of itself in a different location to make sure
> that they are the same. I have not had any luck doing this and thought
> I would present this to the group to see if you can help me out.

You can use File#directory? to test if an entry is a directory (.,.. are
directory)

tiziano

--
Posted viahttp://www.ruby-forum.com/.