[RCR] digest/from_io

Hi,

Would be nice to have in standard Ruby, as it's common to calculate a
digest of a file.

Example of usage:
   
  require 'digest/sha1'
  require 'digest/from_io'

  File.open('/tmp/x') {|f|
    Digest::SHA1.from_io(f)
  }

Implementation:

  # file: digest/from_io.rb
  class Digest::Base
    def self.from_io(io, block_size=8*1024)
      digest = new
      while data = io.read(block_size)
        digest.update(data)
      end
      digest
    end
  end

Another addition would be the raw_digest method (which of course could
be better implemented in C):

  require 'enumerator'
  class Digest::Base
    def raw_digest
      hexdigest.to_enum(:scan, /../).map {|byte| byte.to_i(16).chr}.join
    end
    alias rawdigest raw_digest
  end

Gavin, feel free to add it to extlib/addlib if you think it's worth.

matz, do you think, we can add from_io to stdlib?

Regards,

  Michael

Yeah, sounds like a good idea.

Gavin

···

On Friday, October 15, 2004, 1:53:37 AM, Michael wrote:

Gavin, feel free to add it to extlib/addlib if you think it's worth.

Hi,

At Fri, 15 Oct 2004 00:53:37 +0900,
Michael Neumann wrote in [ruby-talk:116637]:

Implementation:

  # file: digest/from_io.rb
  class Digest::Base
    def self.from_io(io, block_size=8*1024)
      digest = new
      while data = io.read(block_size)
        digest.update(data)
      end
      digest
    end
  end

Another implementation could be:

  def Digest::Base.from(src)
    digest = new
    src.each(&digest.method(:update))
    digest
  end

This requires #each method instead of #read, do you think which
is better?

Another addition would be the raw_digest method (which of course could
be better implemented in C):

  require 'enumerator'
  class Digest::Base
    def raw_digest
      hexdigest.to_enum(:scan, /../).map {|byte| byte.to_i(16).chr}.join
    end
    alias rawdigest raw_digest
  end

It is equivalent to Digest::Base#digest.

···

--
Nobu Nakada

Hi,

At Fri, 15 Oct 2004 00:53:37 +0900,
Michael Neumann wrote in [ruby-talk:116637]:
> Implementation:
>
> # file: digest/from_io.rb
> class Digest::Base
> def self.from_io(io, block_size=8*1024)
> digest = new
> while data = io.read(block_size)
> digest.update(data)
> end
> digest
> end
> end

Another implementation could be:

  def Digest::Base.from(src)
    digest = new
    src.each(&digest.method(:update))
    digest
  end

This requires #each method instead of #read, do you think which
is better?

What if #each does not return a string? Does #update work for all Ruby
objects? Personally I like #from_io more, as it's more natural how it
works.

What if #from would take more arguments, like this:

  Digest.from(io, :each_chunk, blk_sz = 10000, bytes = 1_000_000)
  Digest.from(io, :each_line)

This would be a far more general solution, and as simple to implement.

> Another addition would be the raw_digest method (which of course could
> be better implemented in C):
>
> require 'enumerator'
> class Digest::Base
> def raw_digest
> hexdigest.to_enum(:scan, /../).map {|byte| byte.to_i(16).chr}.join
> end
> alias rawdigest raw_digest
> end

It is equivalent to Digest::Base#digest.

Oh, thanks.

Regards,

  Michael

···

On Fri, Oct 15, 2004 at 12:03:17PM +0900, Nobuyoshi Nakada wrote:

#update should work on all Digest::Base subclasses. This line is roughly:

    src.each { |data| digest.update(data) }

Digest#update methods should work with individual bytes or strings,
shouldn't they?

It also depends on what Digest#update does to the implied data value;
if it calls #to_s or #to_str, then arrays of strings or other values
could be dealt with very easily. To me, that would be as or more
useful than limiting it to IO objects; I might want to generate a
digest from the result of IO::readlines (an array of strings).

-austin

···

On Sat, 16 Oct 2004 19:20:58 +0900, Michael Neumann <mneumann@ntecs.de> wrote:

On Fri, Oct 15, 2004 at 12:03:17PM +0900, Nobuyoshi Nakada wrote:
> Hi,
>
> At Fri, 15 Oct 2004 00:53:37 +0900,
> Michael Neumann wrote in [ruby-talk:116637]:
> > Implementation:
> >
> > # file: digest/from_io.rb
> > class Digest::Base
> > def self.from_io(io, block_size=8*1024)
> > digest = new
> > while data = io.read(block_size)
> > digest.update(data)
> > end
> > digest
> > end
> > end
>
> Another implementation could be:
>
> def Digest::Base.from(src)
> digest = new
> src.each(&digest.method(:update))
> digest
> end
>
> This requires #each method instead of #read, do you think which
> is better?
What if #each does not return a string? Does #update work for all Ruby
objects? Personally I like #from_io more, as it's more natural how it
works.

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca
: as of this email, I have [ 5 ] Gmail invitations