WeakRef and caches

Hi all,
I’m trying to build a caching mechanism into a library I’m writing. I
want to do it using weakrefs. It will look something like this:

require ‘weakref’

class Cache
class << self
@cache_hash = {}
def insert(obj, id)
@cache_hash[id] = WeakRef.new(obj)
end

def retrieve(id)
  o = @cache_hash[id]
  # If this object hasn't been GC'd yet
  if o.weakref_alive?
    # Return a strong ref to it.
    o.__getobj__
  else
    # Regenerate and return object
    # o = MyClass.fetch(id)
    insert(o, id)
    o
  end
end

end
end

Unfortunately, this doesn’t work. The idea is that the cache always
returns strong-refs, so that the user doesn’t have to worry about them
getting garbage-collected - to the user, they’re just ordinary objects.
However, once they go outside the user’s scope, the only reference to
them left is the weakref in the cache, and they can be garbage
collected. Then, if the user requests it again, it can be returned if
it’s not GC’d yet, or regenerated and returned if it has.

It seems, though, that the WeakRef implementation is such that if you
refer to an object other than via its weakref, the weakref-ness is
destroyed and the object never gets GC’d. Can anyone explain why and/or
offer a way around this?

Tim Bates

···


tim@bates.id.au

Hi all,
I’m trying to build a caching mechanism into a library I’m writing. I
want to do it using weakrefs. It will look something like this:

Funny, I was just working on this. :wink:

> It seems, though, that the WeakRef implementation is such that if you > refer to an object other than via its weakref, the weakref-ness is > destroyed and the object never gets GC'd. Can anyone explain why and/or > offer a way around this?

Here’s what I did. The idea was that the system would be addressing an
indeterminate number of persistant objects, in a transparent (to the
user/developer) manner. Thus, the problem was twofold:

  • We need one unique reference for each object, so reloading objects
    from the object store doesn’t create copies and we have sync
    problems

  • Reading from the disk all the time is slow

Keeping everything in memory is also out, since the system needs to
scale.

This gives you an idea what the cache was for. To accomplish this, I
keep two caches: a “strong” cache and a “weak” cache. The "strong"
holds objects until it’s determined that they’ve expired—the strong
cache limit is reached, and they’re written to disk and removed from the
strong cache.

When an object is referred to, first I check the strong cache, then I
check the weak cache. If it’s found in the strong cache, I return it
normally. If it’s in the weak cache, I put it in the strong cache, and
return it. (This solves the multiple-references problem.) If it’s not
found in either, load it from disk and put it in both.

The garbage collector does its stuff to keep things within limits.

Unfortunately this isn’t perfect—I’d like to expire stuff from the
cache based on size, so I could set a memory limit, but Ruby has no way
for querying object size (*). Also, there’s some further evil magic
required to make sure things get written to the disk as appropriate.
(For instance, if an object which has a reference changes the object
after it’s expired from the strong cache, it needs written again.) Not
having destructors makes this slightly painful, and finalizers don’t
work here.

Depending on what you’re doing, this scheme ought to work—I’m not sure
how you’re adding and removing objects, but it’s probably similar if
you’re wanting to cache them—just make sure you’ve got all the cases
covered. :wink:

(If you need a fully-functional transparently persistant object system
with this sort of thing already implemented, lemme know, I can give you
the code.)

hth,

···

On Sat, 9 Aug 2003 10:22:13 +0900 Tim Bates tim@bates.id.au wrote:


Ryan Pavlik rpav@users.sf.net

“The fact that you believe in something called a
’Babedar’ alone leads me to conclude that it’s
always wrong.” - 8BT

I'm trying to build a caching mechanism into a library I'm writing. I
want to do it using weakrefs. It will look something like this:

Perhaps best to don't use weakref. Something like (*this is just an
example*)

   class WeakPool
      def initialize
         @obj = Hash.new
      end

      def final(a)
         lambda { @obj.delete(a) }
      end

      def []=(a, b)
         ObjectSpace.define_finalizer b, final(a)
         @obj[a] = b.id
      end

      def [](a)
         b = @obj[a]
         b = ObjectSpace._id2ref(b) unless b.nil?
         b
      end
   end

Guy Decoux

“Tim Bates” tim@bates.id.au schrieb im Newsbeitrag
news:20030809012045.GA14728@bates.id.au

Unfortunately, this doesn’t work. The idea is that the cache always
returns strong-refs, so that the user doesn’t have to worry about them
getting garbage-collected - to the user, they’re just ordinary objects.
However, once they go outside the user’s scope, the only reference to
them left is the weakref in the cache, and they can be garbage
collected. Then, if the user requests it again, it can be returned if
it’s not GC’d yet, or regenerated and returned if it has.

It seems, though, that the WeakRef implementation is such that if you
refer to an object other than via its weakref, the weakref-ness is
destroyed and the object never gets GC’d. Can anyone explain why and/or
offer a way around this?

That’s the expected behavior: the weakest element in the strongest path
determines the availability of an instance. So an instance won’t be GC’ed
as long as there is at least one strong ref to it. This has to be so, in
order to not change the semantics of strong refs. If they could be cleared
out of nothing many programs would not work as expected any more.

Your cache implementation is exactly as it should be, but the code outside
should only hold on to an instance as long as it is needed, typically a
method invocation’s duration. The only thing the outside world should store
is the id, which is needed to access the instance via the cache. A typical
usage pattern looks like this

def doit(arg)
obj = @cache[@id]
obj.mathod1
obj.mathod2
obj.mathod3(arg)

obj cleared here

end

Regards

robert

PS: There’s a nice article about Java reference types which are quite
similar:
http://developer.java.sun.com/developer/technicalArticles/ALT/RefObj/

I’m mostly done writing my own, except mine is db-based rather than
disk-based - although the eventual plan is to make it support multiple
persistence mechanisms. It has a cache already (to make sure you don’t
get multiple instances of the same database row) and I simply wanted to
make it automatically expire once the user has finished with the
strong-ref (ie it goes out of scope). But WeakRef doesn’t like you
using strong-refs, even if they subsequently go out of scope it doesn’t
let the object be GC’d.

Tim Bates

···

On Sat, Aug 09, 2003 at 10:42:55AM +0900, Ryan Pavlik wrote:

(If you need a fully-functional transparently persistant object system
with this sort of thing already implemented, lemme know, I can give you
the code.)


tim@bates.id.au

Yes, except that it doesn’t work:

[tim@zaphod:3 ~/ruby]$ cat weakref_test.rb
require ‘weakref’

o = Object.new
q = o
o = WeakRef.new(o)
ObjectSpace.garbage_collect
p o
p q
q = nil # o is now the only reference to the object.
ObjectSpace.garbage_collect
puts o
[tim@zaphod:3 ~/ruby]$ ruby weakref_test.rb
#Object:0x40028ba0
#Object:0x40028ba0
#Object:0x40028ba0
[tim@zaphod:3 ~/ruby]$

The final “puts o” should give a RefError, and indeed it does if I
remove the line ‘p q’:

[tim@zaphod:3 ~/ruby 1]$ cat weakref_test.rb
require ‘weakref’

o = Object.new
q = o
o = WeakRef.new(o)
ObjectSpace.garbage_collect
p o
#p q
q = nil # o is now the only reference to the object.
ObjectSpace.garbage_collect
puts o
[tim@zaphod:3 ~/ruby]$ ruby weakref_test.rb
#Object:0x40028c04
weakref_test.rb:11:in `puts’: Illegal Reference - probably recycled (WeakRef::RefError)
from weakref_test.rb:11
[tim@zaphod:3 ~/ruby 1]$

But it seems that doing anything with a strong reference to an object
causes the weakref to stop functioning as expected. Why is this the
case?

Tim Bates

···

On Mon, Aug 11, 2003 at 09:56:44PM +0900, Robert Klemme wrote:

That’s the expected behavior: the weakest element in the strongest path
determines the availability of an instance. So an instance won’t be GC’ed
as long as there is at least one strong ref to it. This has to be so, in
order to not change the semantics of strong refs. If they could be cleared
out of nothing many programs would not work as expected any more.

Your cache implementation is exactly as it should be, but the code outside
should only hold on to an instance as long as it is needed, typically a
method invocation’s duration. The only thing the outside world should store
is the id, which is needed to access the instance via the cache. A typical
usage pattern looks like this


tim@bates.id.au

(If you need a fully-functional transparently persistant object system
with this sort of thing already implemented, lemme know, I can give you
the code.)

I’m mostly done writing my own, except mine is db-based rather than
disk-based - although the eventual plan is to make it support multiple
persistence mechanisms.

Actually I use DBs at the moment myself. (Basically, you just need to
write a Storage driver, but I only have MySQL and SQLite drivers at the
moment; more planned.)

                    It has a cache already (to make sure you don't

get multiple instances of the same database row) and I simply wanted to
make it automatically expire once the user has finished with the
strong-ref (ie it goes out of scope). But WeakRef doesn’t like you
using strong-refs, even if they subsequently go out of scope it doesn’t
let the object be GC’d.

Well, ask yourself exactly what you want the cache to be used for. In
my case, it’s speeding up access to commonly-used objects; so even if
one client gets done with a set of objects, another might use them.
Having them in memory makes things a lot faster. Plus, it’s easier to
do. :wink:

···

On Sat, 9 Aug 2003 11:15:16 +0900 Tim Bates tim@bates.id.au wrote:

On Sat, Aug 09, 2003 at 10:42:55AM +0900, Ryan Pavlik wrote:


Ryan Pavlik rpav@users.sf.net

“The fact that you believe in something called a
’Babedar’ alone leads me to conclude that it’s
always wrong.” - 8BT

Tim Bates wrote:

Your cache implementation is exactly as it should be, but the code outside
should only hold on to an instance as long as it is needed, typically a
method invocation’s duration. The only thing the outside world should store
is the id, which is needed to access the instance via the cache. A typical
usage pattern looks like this

Yes, except that it doesn’t work:

[tim@zaphod:3 ~/ruby]$ cat weakref_test.rb
require ‘weakref’

o = Object.new
q = o
o = WeakRef.new(o)
ObjectSpace.garbage_collect
p o
p q
q = nil # o is now the only reference to the object.
ObjectSpace.garbage_collect
puts o

Playing around I noticed this:

require 'weakref’
o = WeakRef.new(Object.new)
puts o.getobj # A
puts o.getobj.to_s # B
ObjectSpace.garbage_collect
puts o

Comment out line A and B in turn. With only B, o gets GC’ed, but not
with line A.

A wild guess: Parameters to method calls are pushed onto the stack and
(naturally for efficiency) not cleaned away after the method call has
ended. Thus the current call frame contains a reference to the
"unreferenced" object, and you cannot be sure that it will disappear
until after the call frame has been “popped” from the call stack.

Does that make any sense to someone with knowledge of the Ruby innards?

···

On Mon, Aug 11, 2003 at 09:56:44PM +0900, Robert Klemme wrote:


([ Kent Dahl ]/)_ ~ [ http://www.pvv.org/~kentda/ ]/~
))_student_/(( _d L b_/ (pre-) Master of Science in Technology )
( __õ|õ// ) )Industrial economics and technological management(
_
/ö____/ (_engineering.discipline=Computer::Technology)

Wow. Are you planning to have this opened, under some OSS license like
LGPL?

···

On Sat, Aug 09, 2003 at 11:55:17AM +0900, Ryan Pavlik wrote:

On Sat, 9 Aug 2003 11:15:16 +0900 > Tim Bates tim@bates.id.au wrote:

On Sat, Aug 09, 2003 at 10:42:55AM +0900, Ryan Pavlik wrote:

(If you need a fully-functional transparently persistant object system
with this sort of thing already implemented, lemme know, I can give you
the code.)

I’m mostly done writing my own, except mine is db-based rather than
disk-based - although the eventual plan is to make it support multiple
persistence mechanisms.

Actually I use DBs at the moment myself. (Basically, you just need to
write a Storage driver, but I only have MySQL and SQLite drivers at the
moment; more planned.)

/ Alexander Bokovoy
Never promise more than you can perform.
– Publilius Syrus

“Kent Dahl” kentda+news@stud.ntnu.no schrieb im Newsbeitrag
news:bh8eed$r82$1@tyfon.itea.ntnu.no

Tim Bates wrote:

Your cache implementation is exactly as it should be, but the code
outside

should only hold on to an instance as long as it is needed, typically a
method invocation’s duration. The only thing the outside world should
store

is the id, which is needed to access the instance via the cache. A
typical

usage pattern looks like this

Yes, except that it doesn’t work:

[tim@zaphod:3 ~/ruby]$ cat weakref_test.rb
require ‘weakref’

o = Object.new
q = o
o = WeakRef.new(o)
ObjectSpace.garbage_collect
p o
p q
q = nil # o is now the only reference to the object.
ObjectSpace.garbage_collect
puts o

Playing around I noticed this:

require 'weakref’
o = WeakRef.new(Object.new)
puts o.getobj # A
puts o.getobj.to_s # B
ObjectSpace.garbage_collect
puts o

Comment out line A and B in turn. With only B, o gets GC’ed, but not
with line A.

A wild guess: Parameters to method calls are pushed onto the stack and
(naturally for efficiency) not cleaned away after the method call has
ended. Thus the current call frame contains a reference to the
"unreferenced" object, and you cannot be sure that it will disappear
until after the call frame has been “popped” from the call stack.

Does that make any sense to someone with knowledge of the Ruby innards?

A WeakRef does not guarantee that an instance is GC’ed as soon as there is
no strong ref any more. It’s the other way round: if there is no strong
ref any more, then the instance might be GC’ed during a GC cycle.
Introducing another cycle leads to this:

irb(main):001:0> require ‘weakref’
=> true
irb(main):002:0>
irb(main):003:0* o = WeakRef.new(Object.new)
=> #Object:0x2801ca0
irb(main):004:0> o.getobj # A
=> #Object:0x2801ca0
irb(main):005:0> # o.getobj.to_s # B
irb(main):006:0* ObjectSpace.garbage_collect
=> nil
irb(main):007:0> o
=> #Object:0x2801ca0
irb(main):008:0> ObjectSpace.garbage_collect
=> nil
irb(main):009:0> o
WeakRef::RefError: Illegal Reference - probably recycled
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:292:in
output_value' from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:149:ineval_input’
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:146:in
signal_status' from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:146:ineval_input’
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:144:in
each_top_level_ statement' from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:219:inloop’
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:247:in
each_t op_level_statement' from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:218:incatch’

    from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:218:in

each_t op_level_statement' from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:144:ineval_input’
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:70:in start' from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:69:incatch’
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:69:in `start’
from C:/Programme/Ruby-1.8.0/bin/irb:13
Maybe IRB bug!!
irb(main):010:0>

Of course another option is that you might have found a bug in WeakRef / GC,
but I don’t assume so in this case.

Regards

robert
···

On Mon, Aug 11, 2003 at 09:56:44PM +0900, Robert Klemme wrote:

> Wow. Are you planning to have this opened, under some OSS license like > LGPL?

GPL, yes… you can download it now if you really want, but I need a
good week or so to do a formal release. (Site, documentation, a few
updates I’ve been meaning to do, etc.)

···

On Mon, 11 Aug 2003 22:10:48 +0900 Alexander Bokovoy a.bokovoy@sam-solutions.net wrote:


Ryan Pavlik rpav@users.sf.net

“These burnt fries are comin’ out of your check
Mr. Ninja!” - 8BT

Ok. I can wait for a week :wink: more overload at work right now.

···

On Tue, Aug 12, 2003 at 01:57:16AM +0900, Ryan Pavlik wrote:

On Mon, 11 Aug 2003 22:10:48 +0900 > Alexander Bokovoy a.bokovoy@sam-solutions.net wrote:

> Wow. Are you planning to have this opened, under some OSS license like > LGPL?

GPL, yes… you can download it now if you really want, but I need a
good week or so to do a formal release. (Site, documentation, a few
updates I’ve been meaning to do, etc.)

/ Alexander Bokovoy
Your boss climbed the corporate ladder, wrong by wrong.