Can you explain what this repo (yaml/Marshal) code do?

repositiory is a hash keyed on some kind of check sum derived from the
file.
But what is the use of Marshal? something to do with memory management,

instead of writing to file?
What is the format of what is spewed out?

REPO_FILE = "repo.bin".freeze

class Repository
  attr_accessor :main_dir, :duplicate_dir, :extensions

  def initialize(extensions = %w{mp3 ogg})
    @extension = extensions
    @repository = {}
  end

  def process_dir(dir)
    # find all files with the extensions we support
    Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f|
      process_file( File.join(dir, f) )
    end
  end

  def process_file(file)
    digest = digest(file)
    name = @repository[digest]

    if name
      target = duplicate_dir
      # ...
    else
      target = main_dir
      # ...
    end

    FileUtils.cp( file, File.join( target, File.basename( file ) ) )
  end

  def digest(file)
    Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read})
  end

  def self.load(file)
    File.open(file, 'rb') {|io| Marshal.load(io)}
  end

  def save(file)
    File.open(file, 'wb') {|io| Marshal.dump(self, io)}
  end
end

repo = begin
  Repository.load( REPO_FILE )
rescue Exception => e
  # not there => create
  r = Repository.new
  r.main_dir = "foo"
  r.duplicate_dir = "bar"
  r
end

ARGV.each {|dir| repo.process_dir(dir)}

repo.save( REPO_FILE )
http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/88c20ef88239c54a/fa358e55b7f86841?lnk=gst&q=repo&rnum=2#fa358e55b7f86841

Marshal is a way to save the state of your Ruby objects to a file.
Unlike YAML, it outputs in binary format. It is written in C, so it
can serialize and restore objects very quickly.

It looks like in this code, this is simply how the save and load
functions are implemented, and since 'self' is being passed, it will
just serialize the Repository object to file during save and restore
it during load.

If you need readable, but much slower serialization, you could use
YAML in place of the Marshal calls.

···

On 8/1/06, anne001 <anne@wjh.harvard.edu> wrote:

repositiory is a hash keyed on some kind of check sum derived from the
file.
But what is the use of Marshal? something to do with memory management,

It looks like in this code, this is simply how the save and load
functions are implemented, and since 'self' is being passed, it will
just serialize the Repository object to file during save and restore
it during load.

but why do you need to save and load the objects,
I have never seen code like this before. What do you gain?
what is the problem that it resolves

anne001 wrote:

> It looks like in this code, this is simply how the save and load
> functions are implemented, and since 'self' is being passed, it will
> just serialize the Repository object to file during save and restore
> it during load.

but why do you need to save and load the objects,
I have never seen code like this before. What do you gain?
what is the problem that it resolves

It can be used to store objects on disk for future use (e.g. web
application sessions) or to send objects between Ruby interpreters
(only works with same interpreter version & object class loaded on
both).

Well, let's pretend you had some wiki class in your code:

     # a mock wiki object...
     class WikiPage
       def initialize( page_name, author, contents )
         @page_name = page_name
         @revisions = Array.new

         add_revision(author, contents)
       end

       attr_reader :page_name

       def add_revision( author, contents )
         @revisions << { :created => Time.now,
                         :author => author,
                         :contents => contents }
       end

       def wiki_page_references
         [@page_name] + @revisions.last[:contents].scan(/\b(?:[A-Z]+[a-z]+){2,}/)
       end

       # ...
     end

Now, let's assume you have a Hash of these things you are using to run your wiki:

     wiki = Hash.new
     [ ["HomePage", "James", "A page about the SillyEmailExamples..."],
       ["SillyEmailExamples", "James", "Blah, blah, blah..."] ].each do |page|
       new_page = WikiPage.new(*page)
       wiki[new_page.page_name] = new_page
     end

When your script runs you will need to save these pages to a disk somehow, so you don't lose the site contents between runs. You have a ton of options here, of course, including using a database or rolling some method that can write these pages out to files.

Writing them out is a pain though because page contents can be pretty much anything, so you'll need to come up with a good file format that allows you to tell where each revision starts and stops. This probably means handling some escaping characters of some kind, at the minimum.

Or, you can just use Marshal/YAML. With these helpers, saving the entire wiki is reduced to the trivial:

     File.open("wiki.dump", "w") { |file| Marshal.dump(wiki, file) }

When needed, you can load that back with:

     wiki = File.open("wiki.dump") { |file| Marshal.load(file) }

Those files will be stored in a binary format for Ruby to read. If you would prefer a human-readable format, replace the word Marshal with YAML above and make sure your script does a:

   require "yaml"

See how easy it is to get instant saving/loading of entire Ruby structures?

James Edward Gray II

···

On Aug 2, 2006, at 5:35 AM, anne001 wrote:

It looks like in this code, this is simply how the save and load
functions are implemented, and since 'self' is being passed, it will
just serialize the Repository object to file during save and restore
it during load.

but why do you need to save and load the objects,
I have never seen code like this before. What do you gain?
what is the problem that it resolves