Memoize to a file

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in `load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)
2 What is the purpose of the rescue{} suppressing the error info in the
first place?
3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

Thanks.

-- Brian Buckley

···

------------------------------------------
require 'memoize'
include Memoize
def fib(n)
  puts "running... n is #{n}"
  return n if n < 2
  fib(n-1) + fib(n-2)
end
h = memoize(:fib,"fib.cache")
puts fib(10)

Basically it's using exceptions as flow control:

begin
    cache = Hash.new.update(Marshal.load(File.read(file)))
rescue
     cache = {} # empty hash
end

So for whatever reason, if loading the file fails (eg, this is the first time the program has been run) it just starts with an empty cache. I don't know why its failing to read the file.

···

On Jan 31, 2006, at 10:32 PM, Brian Buckley wrote:

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in `load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)
2 What is the purpose of the rescue{} suppressing the error info in the
first place?
3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

Thanks.

-- Brian Buckley

------------------------------------------
require 'memoize'
include Memoize
def fib(n)
  puts "running... n is #{n}"
  return n if n < 2
  fib(n-1) + fib(n-2)
end
h = memoize(:fib,"fib.cache")
puts fib(10)

My questions:
1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}

2 What is the purpose of the rescue{} suppressing the error info in the
first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

I wouldn't do that:
* Marshal is faster than Syck (especially when dumping data)
* YAML takes more space than Marshal'ed data
* there are still more bugs in Syck than in Marshal (the nastiest memory
  issues are believed to be fixed, but there is still occasional data
  corruption)
* Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open("cache.yaml", "w") do |out|
   YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end

···

On Wed, Feb 01, 2006 at 12:32:57PM +0900, Brian Buckley wrote:

--
Mauricio Fernandez

Brian Buckley wrote:

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in `load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)

That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

2 What is the purpose of the rescue{} suppressing the error info in the
first place?

The assumption (whoops!) was that if Hash.new.update failed it was
because there was no cache (i.e. first run), so just return an empty
hash.

3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

It will be slower, but it would work.

Regards,

Dan

Just a thought, but you might like to load this file using the binary
option on Windows. Marshall uses a binary format and Windows does wierd
things to binary files loaded without the binary option.

> 1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}

Perfect. Changing

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.open(file, "rb"){|f| f.read}))
rescue { }

and it works. Should this edit go into the gem (Daniel if you're
listening)?

2 What is the purpose of the rescue{} suppressing the error info in the
> first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

Got it. The error supression here is just about always the correct way to
handle the situation.

As for editing the cache, you can always do

File.open("cache.yaml", "w") do |out|
   YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end

Ahhh. Populate that Marshal formatted file using YAML. Good thought.

why not pstore - it's done all that already and is built-in?

-a

···

On Wed, 1 Feb 2006, Mauricio Fernandez wrote:

On Wed, Feb 01, 2006 at 12:32:57PM +0900, Brian Buckley wrote:

My questions:
1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}

2 What is the purpose of the rescue{} suppressing the error info in the
first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

I wouldn't do that:
* Marshal is faster than Syck (especially when dumping data)
* YAML takes more space than Marshal'ed data
* there are still more bugs in Syck than in Marshal (the nastiest memory
issues are believed to be fixed, but there is still occasional data
corruption)
* Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open("cache.yaml", "w") do |out|
  YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end

--
happiness is not something ready-made. it comes from your own actions.
- h.h. the 14th dali lama

Daniel Berger wrote:

Brian Buckley wrote:

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working
for me but subsequently reading that file (say, by rerunning the
same script) appears NOT to be working (the fib(n) calls are being
run again). Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in
`load': marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)

That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

2 What is the purpose of the rescue{} suppressing the error info in
the first place?

The assumption (whoops!) was that if Hash.new.update failed it was
because there was no cache (i.e. first run), so just return an empty
hash.

3 Instead of using Marshall would using yaml be a reasonable
alternative? (I am thinking of readability of the cache file and
also capability to pre-populate it)

It will be slower, but it would work.

As you and others have pointed out this is lilely a problem caused by not
opening the file in binary mode. IMHO lib code that uses Marshal should
ensure to open files in binary mode (regardless of platform). Advantages
are twofold: we won't see these kind of erros (i.e. it's cross platform)
and documentation (you know from reading the code that the file is
expected to contain binary data).

Also, the line looks a bit strange to me. Creating a new hash and
updating it with a hash read from disk seems superfluous. I'd rather do
something like this:

cache = File.open(file, "rb") {|io| Marshal.load(io)} rescue {}

Marshal.load and Marshal.dump can actually read from and write to an IO
object. This seems most efficient because the file contents do not have
read into mem before demarshalling and it's fail safe the same way as the
old impl.

Kind regards

    robert

That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

I have been on 1.8.2 on Windows straight through. Mauricio's suggestion of
File.open instead of File.read made it work for me (see other posts).

Brian

> and it instead of silently failing I now see the error message: "in `load':
> marshal data too short (ArgumentError)"
>
> My questions:
> 1 What is causing this error? (possibly Windows related?)

That is odd. I've run it on Windows with no trouble in the past.

(FTR: file not opened in binary mode, [177651])

Is it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

The Marshal format hasn't changed for a while:

batsman@tux-chan:~/Anime$ ruby182 -v -e 'p [Marshal::MAJOR_VERSION, Marshal::MINOR_VERSION]'
ruby 1.8.2 (2004-12-25) [i686-linux]
[4, 8]
batsman@tux-chan:~/Anime$ ruby -v -e 'p [Marshal::MAJOR_VERSION, Marshal::MINOR_VERSION]'
ruby 1.8.4 (2005-12-24) [i686-linux]
[4, 8]

Also note that ruby can read Marshal data in older formats if the
MAJOR_VERSION hasn't changed (i.e. if only the MINOR_VERSION was increased):

    if (major != MARSHAL_MAJOR || minor > MARSHAL_MINOR) {
  rb_raise(rb_eTypeError, "incompatible marshal file format (can't be read)\n\
\tformat version %d.%d required; %d.%d given",
     MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
    }
    if (RTEST(ruby_verbose) && minor != MARSHAL_MINOR) {
  rb_warn("incompatible marshal file format (can be read)\n\
\tformat version %d.%d required; %d.%d given",
    MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
    }

(after some searching...)

Back in Apr. 2001, matz said that "Marshal should not change too much
(unless in upper compatible way)" [14063]. The last minor change
happened after 1.6.8 (6 -> 8), and MARSHAL_MAJOR was already 4 in v1_0_1,
7 years, 2 months ago (at which point I got tired of CVSweb).

Marshal's format is more stable than we think.

···

On Thu, Feb 02, 2006 at 06:49:49AM +0900, Daniel Berger wrote:

--
Mauricio Fernandez

PStore is just a wrapper on top of Marshal for transactional file storage. If you need transactions, it's great. Otherwise, you might as well just use Marshal.

James Edward Gray II

···

On Feb 1, 2006, at 9:31 AM, ara.t.howard@noaa.gov wrote:

why not pstore - it's done all that already and is built-in?

it's not quite only that. it also

   - does some simple checks when creating the file (readability, etc)
   - allows db usage to be multi-processed
   - supports deletion
   - rolls backs writes on exceptions / commits using ensure to avoid corrupt
     data file
   - handles read vs write actions using shared/excl locks to boost concurrency
   - uses md5 check to avoid un-needed writes
   - opens in correct modes for all platforms

with no offense meant towards memoize authors - at least of few of the bugs
posted regarding that package would have been addressed by using a built-in
lib rather that rolling one's own. and, of course, that's the big thing - why
not use something already written and tested from the core instead of
re-inventing the wheel?

in any case, i think the pstore lib, simple as it is, is a very underated
library since it provides simple transactional and concurrent persistence to
ruby apps in such an incredibly simply way. now if we could just get joels
fsdb in the core! :wink:

kind regards.

-a

···

On Thu, 2 Feb 2006, James Edward Gray II wrote:

On Feb 1, 2006, at 9:31 AM, ara.t.howard@noaa.gov wrote:

why not pstore - it's done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file storage. If you need transactions, it's great. Otherwise, you might as well just use Marshal.

--
happiness is not something ready-made. it comes from your own actions.
- h.h. the 14th dali lama

These are all great points. Thanks for the lesson. :wink:

James Edward Gray II

···

On Feb 1, 2006, at 9:56 AM, ara.t.howard@noaa.gov wrote:

On Thu, 2 Feb 2006, James Edward Gray II wrote:

On Feb 1, 2006, at 9:31 AM, ara.t.howard@noaa.gov wrote:

why not pstore - it's done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file storage. If you need transactions, it's great. Otherwise, you might as well just use Marshal.

it's not quite only that. it also

  - does some simple checks when creating the file (readability, etc)
  - allows db usage to be multi-processed
  - supports deletion
  - rolls backs writes on exceptions / commits using ensure to avoid corrupt
    data file
  - handles read vs write actions using shared/excl locks to boost concurrency
  - uses md5 check to avoid un-needed writes
  - opens in correct modes for all platforms

I've made a file caching example using PSTore for my toy Memoizable library. I just thought I would post it here, in case it helps/inspires others.

#!/usr/local/bin/ruby -w

# pstore_caching.rb

memoizable.rb (1.34 KB)

···

On Feb 1, 2006, at 9:56 AM, ara.t.howard@noaa.gov wrote:

On Thu, 2 Feb 2006, James Edward Gray II wrote:

On Feb 1, 2006, at 9:31 AM, ara.t.howard@noaa.gov wrote:

why not pstore - it's done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file storage. If you need transactions, it's great. Otherwise, you might as well just use Marshal.

it's not quite only that. it also

  - does some simple checks when creating the file (readability, etc)
  - allows db usage to be multi-processed
  - supports deletion
  - rolls backs writes on exceptions / commits using ensure to avoid corrupt
    data file
  - handles read vs write actions using shared/excl locks to boost concurrency
  - uses md5 check to avoid un-needed writes
  - opens in correct modes for all platforms

#
# Created by James Edward Gray II on 2006-02-03.
# Copyright 2006 Gray Productions. All rights reserved.

require "memoizable"
require "pstore"

#
# A trivial implementation of a custom cache. This cache uses PStore to provide
# a multi-processing safe disk cache. The downside is that the entire cache
# must be loaded for a key check. This can require significant memory for a
# large cache.
#
class PStoreCache
   def initialize( path )
     @cache = PStore.new(path)
   end

   def ( key )
     @cache.transaction(true) { @cache[key] }
   end

   def =( key, value )
     @cache.transaction { @cache[key] = value }
   end
end

class Fibonacci
   extend Memoizable

   def fib( num )
     return num if num < 2
     fib(num - 1) + fib(num - 2)
   end
   memoize :fib, PStoreCache.new("fib_cache.pstore")
end

puts "This method is memoized using a file-based cache..."
start = Time.now
puts "fib(100): #{Fibonacci.new.fib(100)}"
puts "Run time: #{Time.now - start} seconds"

puts
puts "Run again to see the file cache at work."

__END__

James Edward Gray II