Openssl and SHA*

Hi,

I'm quite new to ruby, and I'm facing a problem I can't seem to be able
to solve by myself..
I'm comparing openssl sha1 hash results from a linux command line, to
ruby ones :

···

---
cmd line :
openssl dgst -sha1 my_file

ruby :
require 'digest/sha1'
puts Digest::SHA1.hexdigest(File.read("my_file"))
---
I increase the file and run it again, and again.
The hashes are similars until the file size reaches 512Mo, from then
they differs.
I tried several sha versions (sha256..sha512) and the problem is the
same.
However with MD5, I have no problem.

Anyone has an idea if I'm doing something wrong here ?
Thanks a lot !

ChoBolT
--
Posted via http://www.ruby-forum.com/.

Philippe Chotard wrote:

I'm comparing openssl sha1 hash results from a linux command line, to
ruby ones :
---
cmd line :
openssl dgst -sha1 my_file

ruby :
require 'digest/sha1'
puts Digest::SHA1.hexdigest(File.read("my_file"))
---
I increase the file and run it again, and again.
The hashes are similars until the file size reaches 512Mo, from then
they differs.

Strange. First, try doing it in two stages:

str = File.read("my_file")
puts str.size
puts Digest::SHA1.hexdigest(str)

This may give you a clue if File.read is misbehaving. However this is
unlikely if Digest::MD5 is fine.

But in any case, reading 512MB of data into RAM just to calculate SHA1
is very wasteful. I suggest you recode it:

  puts Digest::SHA1.file("my_file").hexdigest

or read the file in blocks:

  d = Digest::SHA1.new
  File.open("my_file") do |f|
    while chunk = f.read(65536)
      d << chunk
    end
  end
  puts d.hexdigest

If you *still* get the same answer, then perhaps the command-line tool
you are comparing against is at fault! Most Linux systems have at least
two:

  sha1sum <file>
  openssl sha1 <file>

so you can see if those agree or disagree, too.

On my box (Ubuntu Hardy, ruby-1.8.6p114 compiled from source):

$ ls -l ubuntu-8.04-desktop-i386.iso
-rw-r--r-- 1 brian brian 733079552 Apr 24 2008
ubuntu-8.04-desktop-i386.iso
$ sha1sum ubuntu-8.04-desktop-i386.iso
53a07a006d791f7fddc6d53879e826934f73bc0f ubuntu-8.04-desktop-i386.iso
$ openssl dgst -sha1 ubuntu-8.04-desktop-i386.iso
SHA1(ubuntu-8.04-desktop-i386.iso)=
53a07a006d791f7fddc6d53879e826934f73bc0f
$ irb
irb(main):001:0> require 'digest/sha1'
=> true
irb(main):002:0>
Digest::SHA1.file("ubuntu-8.04-desktop-i386.iso").hexdigest
=> "53a07a006d791f7fddc6d53879e826934f73bc0f"
irb(main):003:0> d = Digest::SHA1.new
=> #<Digest::SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709>
irb(main):004:0> File.open("ubuntu-8.04-desktop-i386.iso") do |f|
irb(main):005:1* while chunk = f.read(65536)
irb(main):006:2> d << chunk
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> d.hexdigest
=> "53a07a006d791f7fddc6d53879e826934f73bc0f"
irb(main):010:0>

So I can't see any problem. However I don't really have enough RAM to
read the file all in at once without swapping badly. It's possible that
Digest::SHA1 barfs when given a string > 512MB.

Regards,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Brian Candler wrote:

But in any case, reading 512MB of data into RAM just to calculate SHA1
is very wasteful. I suggest you recode it:

  puts Digest::SHA1.file("my_file").hexdigest

Thanks for your response Brian.
Indeed, using this method I got the right hash. So it looks like as you
said, that the problem is appearing when sha is handling 512MB+ strings.

I'll do some further testing on other systems and versions (Using ruby
1.9.0 on a debian lenny)

Thanks !

···

--
Posted via http://www.ruby-forum.com/\.