Memoize and GC

7rans · 13 April 2010 16:26

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

  def foo
    @foo ||= something
  end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO = []

  def foo
    FOO[object_id] ||= something
  end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

Jesus_Gabriel_y_Gala · 13 April 2010 16:49

One way could be using a closure that redefines the method:

irb(main):019:0> class A
irb(main):020:1> def a
irb(main):021:2> puts "expensive calculation of a..."
irb(main):022:2> a = "complex value"
irb(main):023:2> self.class.send(:define_method, :a) do
irb(main):024:3* a
irb(main):025:3> end
irb(main):026:2> a
irb(main):027:2> end
irb(main):028:1> end
=> nil
irb(main):029:0> obj = A.new
=> #<A:0xb7c9ef20>
irb(main):030:0> obj.a
expensive calculation of a...
=> "complex value"
irb(main):031:0> obj.a
=> "complex value"

Jesus.

···

On Tue, Apr 13, 2010 at 6:26 PM, Intransition <transfire@gmail.com> wrote:

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

def foo
@foo ||= something
end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO =

def foo
FOO[object_id] ||= something
end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

Robert_K1 · 13 April 2010 16:50

I would live with polluting the instance var space. Btw, I usually do not use memoize but rather an explicit Hash with default_proc. That way I have control over where I place the Hash and hence how long it lives.

Of course you can hack something together using #object_id and an external Hash but then you also need to create a finalizer and soon things get overly complicated. If you have issues with polluting the instance variable namespace you can a) put all memoize containers into a single instance variable (typically a Hash) and / or b) generate instance variable names which are likely to be conflict free.

Kind regards

robert

···

On 04/13/2010 06:26 PM, Intransition wrote:

I have a quick question. I just re-read:

  http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

  def foo
    @foo ||= something
  end

will mean that @foo be GC'd when the object is GC'd, but if I do

  FOO =

  def foo
    FOO[object_id] ||= something
  end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Caleb_Clausen1 · 13 April 2010 17:28

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

···

On 4/13/10, Intransition <transfire@gmail.com> wrote:

will mean that @foo be GC'd when the object is GC'd, but if I do

  FOO =

  def foo
    FOO[object_id] ||= something
  end

7rans · 13 April 2010 20:43

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

···

On Apr 13, 1:28 pm, Caleb Clausen <vikk...@gmail.com> wrote:

On 4/13/10, Intransition <transf...@gmail.com> wrote:

> will mean that @foo be GC'd when the object is GC'd, but if I do

> FOO =

> def foo
> FOO[object_id] ||= something
> end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Forum · 14 April 2010 12:38

I like the closure idea, but I guess the memozing got lost a little bit ;).
thus I would write
def a
_a = nil
  self.class.module_eval do
     define_method :a do _a ||= "expensive calculation" end
     define_method :_invalidate_a do _a = nil end
  end
  a
end
Cheers
R.

···

2010/4/13 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Tue, Apr 13, 2010 at 6:26 PM, Intransition <transfire@gmail.com> wrote:

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

def foo
@foo ||= something
end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO =

def foo
FOO[object_id] ||= something
end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

One way could be using a closure that redefines the method:

irb(main):019:0> class A
irb(main):020:1> def a
irb(main):021:2> puts "expensive calculation of a..."
irb(main):022:2> a = "complex value"
irb(main):023:2> self.class.send(:define_method, :a) do

--
Learning without thought is labor lost; thought without learning is perilous.”
--- Confucius

Gavin_Kistner3 · 18 April 2010 02:40

Undocumented and without a test framework, I offer you a class I wrote
a while ago that's still live and in use a couple of years later. It's
not exactly what you're asking for, but close. (It returns string-
based keys for accessing the objects because my usage has it running
in a separate process from the main app.) I reasonably hate the
decision I made to return values versus arrays based on number of
arguments, and it's not as DRY as it could be, but it's working

# Example usage
# class SituationServer < Phrogz::Librarian
# ...
# end

···

On Apr 13, 11:28 am, Caleb Clausen <vikk...@gmail.com> wrote:

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

#
# CLEANING_SCHEDULE = 60*60 # Clean up once an hour
# MAX_STALENESS = 60*60 # Remove items not used in the last
hour
#
# @server = SituationServer.new
# Thread.new{
# loop {
# sleep CLEANING_SCHEDULE
# removed = @server.remove_stale( MAX_STALENESS )
# puts "Just removed #{removed} stale item#{:s unless
removed==1}..."
# }
# }

module Phrogz; end
class Phrogz::Librarian
  def initialize( key_size=4 )
    @key_size = key_size
    @max_keys = 16 ** key_size
    @serializer = "%0#{key_size}x"
    @library = {}
    @stored = {}
    @access = {}
  end

  def store( *objects )
    keys = objects.map{ |object|
      key,obj = @library.find{ |k,o| o==object }
      unless key
        # FIXME: bail if @library.size >= MAX_KEYS, or else this will
infinite loop
        # TODO: come up with a better strategy for finding a free key
        begin
          key = @serializer % rand( @max_keys )
        end while @library.has_key?( key )
        @library[ key ] = object
        @stored[ key ] = Time.now
      end
      key
    }
    keys.length == 1 ? keys.first : keys
  end
  alias_method :<<, :store

  def fetch( *keys )
    now = Time.now # So that multiple accesses are exactly
synchronized
    objects = keys.map{ |key|
      if @library.has_key?( key )
        @access[ key ] = now
        @library[ key ]
      end
    }
    objects.length == 1 ? objects.first : objects
  end

  def size
    @library.size
  end

  def storage_time( *keys )
    results = @stored.values_at( *keys )
    results.length == 1 ? results.first : results
  end

  def age( *keys )
    results = @stored.values_at( *keys ).map{ |t| Time.now - t }
    results.length == 1 ? results.first : results
  end

  def last_access( *keys )
    results = @access.values_at( *keys )
    results.length == 1 ? results.first : results
  end

  def staleness( *keys )
    results = @access.values_at( *keys ).map{ |t| Time.now - t }
    results.length == 1 ? results.first : results
  end

  def discard( *keys )
    results = keys.map{ |key|
      @stored.delete(key)
      @access.delete(key)
      @library.delete(key)
    }
    results.length == 1 ? results.first : results
  end

  def remove_stale( max_staleness )
    now = Time.now
    stale_keys = @access.select{ |key,time| (now-time) >
max_staleness }.keys
    stale_keys.each{ |key|
      @stored.delete( key )
      @access.delete( key )
      object = @library.delete( key )
      yield object if block_given?
    }
    stale_keys.length
  end

  def remove_old( max_age )
    now = Time.now
    stale_keys = @stored.select{ |key,time| (now-time) >
max_age }.keys
    stale_keys.each{ |key|
      @stored.delete( key )
      @access.delete( key )
      object = @library.delete( key )
      yield object if block_given?
    }
    stale_keys.length
  end

end

Robert_K1 · 14 April 2010 11:02

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Kind regards

robert

github.com

rklemme/muppet-laboratories/blob/master/lib/lruhash.rb


# LRU based Hash

require 'enumerator'

# Hash with LRU expiry policy.  There are at most max_size elements in a
# LruHash.  When adding more elements old elements are removed according
# to LRU policy.
class LRUHash
  include Enumerable

  attr_reader :max_size
  attr_accessor :default, :default_proc, :release_proc

  def initialize(max_size, default_value = nil, &block)
    @max_size = normalize_max(max_size)
    @default = default_value
    @default_proc = block

    @h = {}

This file has been truncated. show original

···

2010/4/13 Intransition <transfire@gmail.com>:

On Apr 13, 1:28 pm, Caleb Clausen <vikk...@gmail.com> wrote:

On 4/13/10, Intransition <transf...@gmail.com> wrote:

> will mean that @foo be GC'd when the object is GC'd, but if I do

> FOO =

> def foo
> FOO[object_id] ||= something
> end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Caleb_Clausen1 · 16 April 2010 17:48

Last time I went looking for a cache class, I found several written in
ruby, but I really needed something that would be super-fast, so it
had to be written in c. I did write about half of a ruby hash class in
c before giving up.... For most use cases, tho, presumably including
this one, a ruby implementation would be fine. Having a customizable
age-out policy is the only other nice-to-have I can think of.

···

On 4/13/10, Intransition <transfire@gmail.com> wrote:

On Apr 13, 1:28 pm, Caleb Clausen <vikk...@gmail.com> wrote:

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

7rans · 14 April 2010 19:13

Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.

···

On Apr 14, 7:02 am, Robert Klemme <shortcut...@googlemail.com> wrote:

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Robert_K1 · 15 April 2010 07:16

Definitively! But this would be a different beast for me. One might
be able to implement this by using a LRUHash or by inheriting from it.
But I would not combine that in a single class. Another option could
be to have a modular approach where the container gets a
DeletionPolicy that is responsible for deciding when or what to
delete. The interface could look like

interface DeletionPolicy
  def insert(key,val)
  def remove(key, val)
  def access(key, val)
  def next_delete # yields key
end

Just a quick hack though withou

Kind regards

robert

···

2010/4/14 Intransition <transfire@gmail.com>:

On Apr 14, 7:02 am, Robert Klemme <shortcut...@googlemail.com> wrote:

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

7rans · 15 April 2010 11:43

I kind of like that idea.

···

On Apr 15, 3:16 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2010/4/14 Intransition <transf...@gmail.com>:

> On Apr 14, 7:02 am, Robert Klemme <shortcut...@googlemail.com> wrote:

>> What else do you want to handle? My implementation [1] does also only
>> consider size but I have also seen implementations that consider
>> insertion or access time. Personally I dislike those solutions
>> because they might throw away stuff too early and make the
>> implementation more complex. The main point (resource control) is
>> already achieved by limiting the size of a LRU map.

> Fair enough, I just thought it would be nice to have the *option* of a
> time-based cache.

Definitively! But this would be a different beast for me. One might
be able to implement this by using a LRUHash or by inheriting from it.
But I would not combine that in a single class. Another option could
be to have a modular approach where the container gets a
DeletionPolicy that is responsible for deciding when or what to
delete. The interface could look like

interface DeletionPolicy
def insert(key,val)
def remove(key, val)
def access(key, val)
def next_delete # yields key
end

Just a quick hack though withou

Topic		Replies	Views
Doubt about GC ruby-talk	0	69	16 May 2005
GC help ruby-talk	14	110	6 April 2008
Collecting garbage ruby-talk	8	74	13 July 2005
Garbage Collector(GC) and nil ruby-talk	3	148	28 May 2012
Question on GC ruby-talk	5	93	19 May 2006

Memoize and GC

Related topics