Comparing two files for equality

Hi everybody,

After reading the "Refactoring for PL/SQL Developers" in the January
issue of the Oracle magazine, I tried to implement a program that
compares two files for equality. I came up with the following trivial
solution

puts (IO.readlines file[0]) == (IO.readlines file[1])

Have you got any other suggestions?

Kind Regards,
Ed

···

--
Alcohol is the anesthesia by which we endure the operation of life.
-- George Bernard Shaw

Edgardo Hames wrote:

Hi everybody,

After reading the "Refactoring for PL/SQL Developers" in the January
issue of the Oracle magazine, I tried to implement a program that
compares two files for equality. I came up with the following trivial
solution

puts (IO.readlines file[0]) == (IO.readlines file[1])

Have you got any other suggestions?

Is this cheating?

require 'fileutils'
p FileUtils.cmp(file[0], file[1])

Edgardo Hames wrote:

Hi everybody,

Moin.

I tried to implement a program that
compares two files for equality. I came up with the following trivial
solution

puts (IO.readlines file[0]) == (IO.readlines file[1])

Have you got any other suggestions?

Not much different:

def files_equal?(*files)
   files.map do |file|
     File.size file
   end.uniq.size <= 1 and
   files.map do |file|
     File.read file
   end.uniq.size <= 1
end

This ought to be slightly faster in the average case.

Other optimizations would be reading the files line-wise in parallel and bailing out as soon as one of the lines differs.

After reading "Refactoring for PL/SQL Developers" in January's Oracle

magazine, I tried to implement a program that compares two files for
equality. I came up with the following trivial solution:

puts (IO.readlines file[0]) == (IO.readlines file[1])
Have you other suggestions?

I came up with this:

class File
SIZE =1024
include Comparable
def <=> f # Does not rewind, before or after.
  s1 =s2 =s =n =0 # Efficiency.
  begin # At least once.
    s1 ,s2 =[ self ,f].collect do |a|
      s =a.read( SIZE)
      s ? s :String.new # Or ''.
    end
  end while 0 ==( n =(s1 <=>s2)) &&
      SIZE ==s1.length
  n # Return.
end
end # class File

Kind regards,

···

Edgardo Hames <ehames@gmail.com> Jan 12, 2005 at 01:26 PM wrote:

% gem install diff-lcs
% ldiff file1 file2

:wink:

(Except that in doing this I discovered a bug in Diff::LCS. Expect a
bugfix when I figure it out.)

-austin

···

On Wed, 12 Jan 2005 13:26:53 +0900, Edgardo Hames <ehames@gmail.com> wrote:

Have you got any other suggestions?

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

Another alternative - probably slow as h*ll, but...

    def files_equals?(*files)
        require 'digest/md5'

        return files.map do |file|
            Digest::MD5.hexdigest(File.read(file))
        end.uniq.size == 1
    end

//Anders

···

On Wed, Jan 12, 2005 at 09:11:17PM +0900, Florian Gross wrote:

Edgardo Hames wrote:

>Hi everybody,

Moin.

>I tried to implement a program that
>compares two files for equality. I came up with the following trivial
>solution
>
>puts (IO.readlines file[0]) == (IO.readlines file[1])
>
>Have you got any other suggestions?

Not much different:

def files_equal?(*files)
  files.map do |file|
    File.size file
  end.uniq.size <= 1 and
  files.map do |file|
    File.read file
  end.uniq.size <= 1
end

This ought to be slightly faster in the average case.

Other optimizations would be reading the files line-wise in parallel and
bailing out as soon as one of the lines differs.

--
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. Anders Engström aengstrom@gnejs.net
. http://www.gnejs.net PGP-Key: ED010E7F
. [Your mind is like an umbrella. It doesn't work unless you open it.]

Florian Gross wrote:

def files_equal?(*files)
  files.map do |file|
    File.size file
  end.uniq.size <= 1 and
  files.map do |file|
    File.read file
  end.uniq.size <= 1
end

Could anybody please explain the "end.uniq.size" line?
Thanks!

Greetings,
Andreas

···

This ought to be slightly faster in the average case.

Other optimizations would be reading the files line-wise in parallel and bailing out as soon as one of the lines differs.

Hello,

I came up with this:

[snip]

could also be written as (not thread-safe):

def <=> f
   result = size <=> f.size
   ( result = read(SIZE) <=> f.read(SIZE) ) until result != 0 or eof?
   result
end

def IO.compare_file(fn1, fn2)
   open(fn1){|f1|
     open(fn2){|f2| f1<=> f2 }}
end
would be quite thread-safe
maybe i'm worrying too much...

one other solution:
FileUtils.compare_file(fn1, fn2)

cheers

···

On 12.1.2005, at 21:56, georgesawyer wrote:

Wow! I hope that I come up with some more questions that help you all
find bugs in your programs.

Working for better Ruby apps :wink:
Ed

···

On Thu, 13 Jan 2005 06:42:45 +0900, Austin Ziegler <halostatue@gmail.com> wrote:

On Wed, 12 Jan 2005 13:26:53 +0900, Edgardo Hames <ehames@gmail.com> wrote:
> Have you got any other suggestions?

% gem install diff-lcs
% ldiff file1 file2

:wink:

(Except that in doing this I discovered a bug in Diff::LCS. Expect a
bugfix when I figure it out.)

--
Alcohol is the anesthesia by which we endure the operation of life.
-- George Bernard Shaw

To which a nice interface would be

IO.zip('file1', 'file2') {|a, b|
  return false if a != b
}

return true

martin

···

Florian Gross <flgr@ccan.de> wrote:

Other optimizations would be reading the files line-wise in parallel and
bailing out as soon as one of the lines differs.

end is the end of an expression that returns an array.

# this is quivalent.
a=files.map do |file|
...
end
a.uniq

# Since everything in ruby is an expression, you can do:
files.map do |file|
...
end.uniq

# making the expression explicit...
(files.map do |file|
....
end).uniq

Regards,
Nick

···

On Thu, 13 Jan 2005 01:15:25 +0900, Andreas Semt <as@computer-leipzig.de> wrote:

Florian Gross wrote:
>
> def files_equal?(*files)
> files.map do |file|
> File.size file
> end.uniq.size <= 1 and
> files.map do |file|
> File.read file
> end.uniq.size <= 1
> end
>

Could anybody please explain the "end.uniq.size" line?
Thanks!

Greetings,
Andreas

> This ought to be slightly faster in the average case.
>
> Other optimizations would be reading the files line-wise in parallel and
> bailing out as soon as one of the lines differs.
>

--
Nicholas Van Weerdenburg

Nicholas Van Weerdenburg wrote:

end is the end of an expression that returns an array.

# this is quivalent.
a=files.map do |file|
...
end
a.uniq

# Since everything in ruby is an expression, you can do:
files.map do |file|
...
end.uniq

# making the expression explicit...
(files.map do |file|
....
end).uniq

Regards,
Nick

Thanks Nick!

Nice Ruby code.
It's every time a pleasure to see a solution by Florian.

Greetings,
Andreas

···

On Thu, 13 Jan 2005 01:15:25 +0900, Andreas Semt <as@computer-leipzig.de> wrote:

Florian Gross wrote:

def files_equal?(*files)
files.map do |file|
   File.size file
end.uniq.size <= 1 and
files.map do |file|
   File.read file
end.uniq.size <= 1
end

Could anybody please explain the "end.uniq.size" line?
Thanks!

Greetings,
Andreas

This ought to be slightly faster in the average case.

Other optimizations would be reading the files line-wise in parallel and
bailing out as soon as one of the lines differs.

Andreas Semt wrote:

Nice Ruby code.
It's every time a pleasure to see a solution by Florian.

Thank you. :slight_smile: