JRuby disabling ObjectSpace: what implications?

As some of you may have heard, we're considering disabling ObjectSpace.each_object by default in JRuby. Primarily, this is for performance; to support each_object, we have to bend over backwards, maintaining lists of weak references to all objects in the system and periodically cleaning out those lists. Here's some example performance, from a fractal benchmark in the JRuby source:

With ObjectSpace: Ruby Elapsed 45.967000
Without ObjectSpace: Ruby Elapsed 4.280000

What's most frustrating about this is that almost *no* libraries or apps use each_object, and it's a terrible performance hit for us.

The one really visible use of each_object is in test/unit, where the default console-based runner does each_object(Class) to find all subclasses of TestCase. Because this is a heavily-used library (to say the least), I've made modifications to JRuby to always support each_object(Class) by maintaining a bidirectional graph of parent and child classes. So that much wouldn't go away (but I'd prefer an implementation that uses Class#inherited, since it would be cleaner, faster, and deterministic).

So...I'm writing this to see what the general Ruby world thinks of us having ObjectSpace disabled by default, enableable via a command line option (or perhaps through a library? -robjectspace?).

I think more and more of you may want to give JRuby another look over the next few months, so I think we need to involve you in such decisions.

- Charlie

As some of you may have heard, we're considering disabling ObjectSpace.each_object by default in JRuby. Primarily, this is for performance; to support each_object, we have to bend over backwards, maintaining lists of weak references to all objects in the system and periodically cleaning out those lists.

Is this also true for ObjectSpace#_id2ref ?

Regards,

Bill

···

From: "Charles Oliver Nutter" <charles.nutter@sun.com>

ara.t.howard wrote:
> hmmm. ok i'm brainstorming here which you can ignore if you like as i
> know less that nothing about jvms or implementing ruby but here goes:
> what if you could invert the problem? what i objects knew about the
> global ObjectSpaceThang and could be forced to register themselves on
> demand somehow? without a reference i've no idea how, just throwing
> that out there. or, another stupid idea, what if the objects themselves
> were the tree/graph of weak references parent -> children. crawling it
> would be, um, fun - but you could prune dead objects *only* when walking
> the graph. this should be possible in ruby since you always have the
> notion of a parent object - which is Object - so all objects should be
> either reachable or leaks. now back to drinking my regularly scheduled
> beer...

Continuing this discussion here...

Please, continue to brainstorm. I don't claim to have thought out every aspect of this problem or every possible solution. I'd *love* to discover I've missed an obvious fix.

Your idea has come up in the past, and it would probably eliminate the cost of an ObjectSpace list. However that doesn't appear to be where we pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

- Constructing an extra object for every Ruby object...namely, the WeakReference object to point to it. So we pay a memory/allocation/initialization cost.
- WeakReference itself causes Java's GC to have to do additional checks, so it can notify the WeakReference that the object it points at has gone away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some implementations of ObjectSpace-like behavior, so it's perfectly valid. But again, there's certain aspects of ObjectSpace that are just problematic...

- threading or concurrency of any kind? No, you can't have multithreading with ObjectSpace, nor a concurrent/parallel GC (and it potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace doesn't have to be deterministic"...but when it starts getting wired into libraries like test/unit, it seems like people expect it to be. If we can say OS isn't deterministic, then *nobody* should be relying in its contents for core libraries, and we could reasonably claim that each_object will never return *anything*.

- Charlie

<snip>

So...I'm writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).

.ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer
self, @@final.call(@hkeyfinal)
ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
{File.unlink("tmp.txt")}
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|
obj>
ext\Win32API\lib\win32\registry.rb:569:
ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal)
lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self,
Session::callback(@dbprot))
lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This
means that the dRuby
lib\drb\drb.rb:361: # This, the default implementation, uses an
object's local ObjectSpace
lib\drb\drb.rb:375: ObjectSpace._id2ref(ref)
lib\finalize.rb:59: ObjectSpace.call_finalizer(obj)
lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc)
lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc)
lib\finalize.rb:180: # registering function to
ObjectSpace#add_finalizer
lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc)
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj,
HistorySavingAbility.create_finalizer)
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do

io>

lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # =>
0.
lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass}
instance(s)"
lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc)
lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self)
lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self)
lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class)
do |klass|
lib\test\unit\autorunner.rb:54: :objectspace => proc do |r|
lib\test\unit\autorunner.rb:55: require 'test/unit/collector/
objectspace'
lib\test\unit\autorunner.rb:56: c =
Collector::ObjectSpace.new
lib\test\unit\autorunner.rb:80: @collector =
COLLECTORS[(standalone ? :dir : :objectspace)]
lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir,
file=::File, object_space=::ObjectSpace, req=nil)
lib\test\unit\collector\objectspace.rb:10: class ObjectSpace
lib\test\unit\collector\objectspace.rb:13: NAME = 'collected
from the ObjectSpace'
lib\test\unit\collector\objectspace.rb:15: def
initialize(source=::ObjectSpace)
lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite
for you. It then runs
lib\weakref.rb:16:# ObjectSpace.garbage_collect
lib\weakref.rb:62: ObjectSpace._id2ref(@__id)
lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final
lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final
lib\weakref.rb:98: ObjectSpace.garbage_collect
test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj|
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\ruby\test_objectspace.rb:3:class TestObjectSpace <
Test::Unit::TestCase
test\ruby\test_objectspace.rb:10: o =
ObjectSpace._id2ref(obj.object_id);\
test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj|
test\testunit\collector\test_dir.rb:62: class ObjectSpace
test\testunit\collector\test_dir.rb:81: @object_space =
ObjectSpace.new
test\testunit\collector\test_objectspace.rb:6:require 'test/unit/
collector/objectspace'
test\testunit\collector\test_objectspace.rb:11: class
TC_ObjectSpace < TestCase
test\testunit\collector\test_objectspace.rb:41: @c =
ObjectSpace.new(@object_space)
test\testunit\collector\test_objectspace.rb:44: def
full_suite(name=ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:51:
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:83: expected =
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:89: expected =
TestSuite.new(ObjectSpace::NAME)
test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do |
klass>

So, in summary, if we exclude those libraries where only tests are
affected, this would affect:

win32-registry
tk
cgi
drb
finalize
irb
shell
singleton
tempfile
test-unit
weakref

Some comments on each of these as they relate to JRuby:

win32-registry: You have no hope of implementing this without JNA
anyway, unless there's some Java binding I don't know about. Besides,
I couldn't tell you why on Earth win32-registry would need a
finalizer.

tk: No one will care. They'll use SWT or Swing bindings. Besides, you
would need JNA.

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.

drb: This could be a big deal.

finalize: Did anyone even know about this? Does anyone use it?

irb: You've got jirb.

shell: This could be a problem.

singleton: Ditto.

tempfile: Meh, I'm guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

test-unit: Already mentioned.

weakref: You've stated that Java has its own implementation.

Regards,

Dan

···

On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

Charles Oliver Nutter wrote:

As some of you may have heard, we're considering disabling ObjectSpace.each_object by default in JRuby.

I brought this up at RubyConf, and got about 50% of people saying "I agree" and 50% of people saying "I do not agree". As it stands now, we will proceed with having ObjectSpace.each_object disabled by default in JRuby 1.1 final. See the rest of this thread for the backstory and notes on test/unit.

The folks who disagree appear to only disagree on principal, rather than based on any real demonstrable problem with turning each_object off. On the other hand, the folks who want to disable it have real-world concerns: performance on the apps they're running. Until there's a compelling, real-world, non-ideological reason to leave each_object enabled by default, it will be disabled in JRuby (enable with +O flag or jruby.objectspace.enabled=true property).

This change is already there in 1.1b1, released on Friday evening.

- Charlie

Bill Kelly wrote:

From: "Charles Oliver Nutter" <charles.nutter@sun.com>

As some of you may have heard, we're considering disabling ObjectSpace.each_object by default in JRuby. Primarily, this is for performance; to support each_object, we have to bend over backwards, maintaining lists of weak references to all objects in the system and periodically cleaning out those lists.

Is this also true for ObjectSpace#_id2ref ?

Not directly. _id2ref is handled in a similar way, but we have an event we can trigger off to start tracking an object; namely, Object#id.

When you request an id, we start tracking that object for purposes of _id2ref. Not until. So that would not be affected by disabling ObjectSpace.

In actually, however, _id2ref is primarily used for things like weak references, so you can hold a virtual reference to an object without preventing it from being collected. We could provide an implementation of Ruby's weak references using Java's weak references that would allow us to escape _id2ref entirely for that use case.

Are there other places _id2ref is used?

- Charlie

ara.t.howard wrote:
> hmmm. ok i'm brainstorming here which you can ignore if you like as i
> know less that nothing about jvms or implementing ruby but here goes:
> what if you could invert the problem? what i objects knew about the
> global ObjectSpaceThang and could be forced to register themselves on
> demand somehow? without a reference i've no idea how, just throwing
> that out there. or, another stupid idea, what if the objects themselves
> were the tree/graph of weak references parent -> children. crawling it
> would be, um, fun - but you could prune dead objects *only* when walking
> the graph. this should be possible in ruby since you always have the
> notion of a parent object - which is Object - so all objects should be
> either reachable or leaks. now back to drinking my regularly scheduled
> beer...

Continuing this discussion here...

Please, continue to brainstorm. I don't claim to have thought out every aspect of this problem or every possible solution. I'd *love* to discover I've missed an obvious fix.

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM has to keep track of instances anyway and implementing this in Java via WeakReferences seems to duplicate functionality that is already there. Did you consider using "Java Virtual Machine Tools Interface"?

You could either follow the same approach of the heapTracker presented on that page and use a flag or require a lib that enables ObjectSpace (because of the overhead of instrumentation).

Alternatively there may be another method that does not need instrumentation and that can give you access to every (reachable) object in the JVM.

Your idea has come up in the past, and it would probably eliminate the cost of an ObjectSpace list. However that doesn't appear to be where we pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

- Constructing an extra object for every Ruby object...namely, the WeakReference object to point to it. So we pay a memory/allocation/initialization cost.
- WeakReference itself causes Java's GC to have to do additional checks, so it can notify the WeakReference that the object it points at has gone away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some implementations of ObjectSpace-like behavior, so it's perfectly valid. But again, there's certain aspects of ObjectSpace that are just problematic...

- threading or concurrency of any kind? No, you can't have multithreading with ObjectSpace, nor a concurrent/parallel GC (and it potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace doesn't have to be deterministic"...but when it starts getting wired into libraries like test/unit, it seems like people expect it to be. If we can say OS isn't deterministic, then *nobody* should be relying in its contents for core libraries, and we could reasonably claim that each_object will never return *anything*.

I'd reformulate the requirement here: ObjectSpace.each_object must yield every object that was existent before the invocation and that is strongly reachable. I believe for the typical use case (e.g. traversing all class instances) this is enough while leaving enough flexibility for the implementation (i.e. create s snapshot of some form, iterate through some internal structure that may change due to new objects being created during #each_object etc.).

Kind regards

  robert

···

On 28.10.2007 08:06, Charles Oliver Nutter wrote:

Daniel Berger wrote:

<snip>

So...I'm writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).

.ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer
self, @@final.call(@hkeyfinal)
ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
{File.unlink("tmp.txt")}
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|
obj>
ext\Win32API\lib\win32\registry.rb:569:
ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal)
lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self,
Session::callback(@dbprot))
lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This
means that the dRuby
lib\drb\drb.rb:361: # This, the default implementation, uses an
object's local ObjectSpace
lib\drb\drb.rb:375: ObjectSpace._id2ref(ref)
lib\finalize.rb:59: ObjectSpace.call_finalizer(obj)
lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc)
lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc)
lib\finalize.rb:180: # registering function to
ObjectSpace#add_finalizer
lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc)
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj,
HistorySavingAbility.create_finalizer)
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
>io>
lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # =>
0.
lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass}
instance(s)"
lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc)
lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self)
lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self)
lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class)
do |klass|
lib\test\unit\autorunner.rb:54: :objectspace => proc do |r|
lib\test\unit\autorunner.rb:55: require 'test/unit/collector/
objectspace'
lib\test\unit\autorunner.rb:56: c =
Collector::ObjectSpace.new
lib\test\unit\autorunner.rb:80: @collector =
COLLECTORS[(standalone ? :dir : :objectspace)]
lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir,
file=::File, object_space=::ObjectSpace, req=nil)
lib\test\unit\collector\objectspace.rb:10: class ObjectSpace
lib\test\unit\collector\objectspace.rb:13: NAME = 'collected
from the ObjectSpace'
lib\test\unit\collector\objectspace.rb:15: def
initialize(source=::ObjectSpace)
lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite
for you. It then runs
lib\weakref.rb:16:# ObjectSpace.garbage_collect
lib\weakref.rb:62: ObjectSpace._id2ref(@__id)
lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final
lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final
lib\weakref.rb:98: ObjectSpace.garbage_collect
test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj|
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\ruby\test_objectspace.rb:3:class TestObjectSpace <
Test::Unit::TestCase
test\ruby\test_objectspace.rb:10: o =
ObjectSpace._id2ref(obj.object_id);\
test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj|
test\testunit\collector\test_dir.rb:62: class ObjectSpace
test\testunit\collector\test_dir.rb:81: @object_space =
ObjectSpace.new
test\testunit\collector\test_objectspace.rb:6:require 'test/unit/
collector/objectspace'
test\testunit\collector\test_objectspace.rb:11: class
TC_ObjectSpace < TestCase
test\testunit\collector\test_objectspace.rb:41: @c =
ObjectSpace.new(@object_space)
test\testunit\collector\test_objectspace.rb:44: def
full_suite(name=ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:51:
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:83: expected =
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:89: expected =
TestSuite.new(ObjectSpace::NAME)
test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do |
klass>

So, in summary, if we exclude those libraries where only tests are
affected, this would affect:

win32-registry
tk
cgi
drb
finalize
irb
shell
singleton
tempfile
test-unit
weakref

Some comments on each of these as they relate to JRuby:

Of these, only the following would be affected, since only each_object would be disabled by default:

tk: No one will care. They'll use SWT or Swing bindings. Besides, you
would need JNA.

> ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|

Quite right, and there are currently no plans (or demand) for Tk support in JRuby. Swing is a far better GUI API, especially when wrapped in Ruby.

irb: You've got jirb.

> lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|

This could still be supported through a similar mechanism as each_object(Class), by keeping a weak hash of all Module instances.

shell: This could be a problem.

> lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do

I'd be surprised if shell worked 100% correctly right now anyway, due to process-control requirements we can't support well on JVM. But I would also expect this use of each_object to have a "better" implementation, and if not it could again be a specific-purpose weak hash for IO streams (which we almost have already since we want to be able to clean them up on exit.

singleton: Ditto.

I'd have to look at this one. This could be another good candidate for reimplementation in a lot less Java code; singleton support would be pretty easy to write up in a few lines of Java.

test-unit: Already mentioned.

So pretty few libraries would be affected, and I don't think any couldn't be dealt with in other ways. And to reiterate: finalizers and _id2ref wouldn't be affected (though I'd prefer to find alternative mechanisms for _id2ref).

- Charlie

···

On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com> > wrote:

I don't think they're making ObjectSpace go away. Just
ObjectSpace#each_object.

(I'm not a Jruby developer, so I don't trust the correctness of anything
I say.)

drb: This could be a big deal.
weakref: You've stated that Java has its own implementation.

This uses _id2ref, which doesn't appear to be going away.

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.
finalize: Did anyone even know about this? Does anyone use it?
tempfile: Meh, I'm guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

Finalizers could be implemented using Java's finalize() method for
classes that need it. This method of implementing finalizers could
probably be compatibly exposed using ObjectSpace.

shell: This could be a problem.

This looks broken anyway since it uses fork.

singleton: Ditto.

one is in documentation comment, giving an example of a specific behavior
of the library. the other is in the same example, included executably
after an if __FILE__ == $0 condition. So no actual problem here.

irb: You've got jirb.

jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you
pointed ou is class (Module) iteration, used for completion, but it's
more general iteration than test/unit uses, and #inherited techniques
that can be used for test/unit may not work here.

--Ken

···

On Sun, 28 Oct 2007 22:13:38 +0900, Daniel Berger wrote:

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Quoth Daniel Berger:

...
shell: This could be a problem.
...

As far as I know, shell isn't used extensively. From reading the source, it
appears to be very much linked to the host system's processes, files, etc,
which may be inappropriate for JRuby anyways (I'm guessing here).

Regards,

···

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Bill Kelly wrote:

Is this also true for ObjectSpace#_id2ref ?

Not directly. _id2ref is handled in a similar way, but we have an event we can trigger off to start tracking an object; namely, Object#id.

When you request an id, we start tracking that object for purposes of _id2ref. Not until. So that would not be affected by disabling ObjectSpace.

I see, thanks. Nifty. :slight_smile:

In actually, however, _id2ref is primarily used for things like weak references, so you can hold a virtual reference to an object without preventing it from being collected. We could provide an implementation of Ruby's weak references using Java's weak references that would allow us to escape _id2ref entirely for that use case.

Are there other places _id2ref is used?

I think I've used _id2ref exactly twice. I can't recall the first
usage; I don't think it made it into production code. The most
recent use was to store some ruby object id's in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.

(I suppose DRb might do something similar?)

Regards,

Bill

···

From: "Charles Oliver Nutter" <charles.nutter@sun.com>

Hi,

At Sun, 28 Oct 2007 16:16:25 +0900,
Charles Oliver Nutter wrote in [ruby-talk:276236]:

Are there other places _id2ref is used?

drb.

···

--
Nobu Nakada

i use it quite often as a way to have meta-programming 'storage' without polluting instances:

foo = method :foo

module_eval <<-code
  def foo(*a, &b)
    ObjectSpace._id2ref(#{ foo.id }).bind(self).call(*a, &b)
  end
code

which is fabricated - but you get the concept: string in eval maps to live object at run time. when #define_method takes a block this won't be used much i think though...

cheers.

a @ http://codeforpeople.com/

···

On Oct 28, 2007, at 1:16 AM, Charles Oliver Nutter wrote:

Are there other places _id2ref is used?

--
it is not enough to be compassionate. you must act.
h.h. the 14th dalai lama

Robert Klemme wrote:

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM has to keep track of instances anyway and implementing this in Java via WeakReferences seems to duplicate functionality that is already there. Did you consider using "Java Virtual Machine Tools Interface"?

http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls

You could either follow the same approach of the heapTracker presented on that page and use a flag or require a lib that enables ObjectSpace (because of the overhead of instrumentation).

You just hit on exactly why we don't use JVMTI for ObjectSpace. It would certainly work, but it would add a lot of overhead we'd never expect people to accept in a real application. Plus, it would track far more object instances than we actually want tracked. We'd love to include a JVMTI-based ObjectSpace implementation, however...it just hasn't been a high priority to implement since 99% of users never actually need ObjectSpace.

Alternatively there may be another method that does not need instrumentation and that can give you access to every (reachable) object in the JVM.

If there is...we haven't found it. The "linked weakref list" has been the least overhead so far, and it's still a lot of overhead.

Your idea has come up in the past, and it would probably eliminate the cost of an ObjectSpace list. However that doesn't appear to be where we pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

- Constructing an extra object for every Ruby object...namely, the WeakReference object to point to it. So we pay a memory/allocation/initialization cost.
- WeakReference itself causes Java's GC to have to do additional checks, so it can notify the WeakReference that the object it points at has gone away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some implementations of ObjectSpace-like behavior, so it's perfectly valid. But again, there's certain aspects of ObjectSpace that are just problematic...

- threading or concurrency of any kind? No, you can't have multithreading with ObjectSpace, nor a concurrent/parallel GC (and it potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace doesn't have to be deterministic"...but when it starts getting wired into libraries like test/unit, it seems like people expect it to be. If we can say OS isn't deterministic, then *nobody* should be relying in its contents for core libraries, and we could reasonably claim that each_object will never return *anything*.

I'd reformulate the requirement here: ObjectSpace.each_object must yield every object that was existent before the invocation and that is strongly reachable. I believe for the typical use case (e.g. traversing all class instances) this is enough while leaving enough flexibility for the implementation (i.e. create s snapshot of some form, iterate through some internal structure that may change due to new objects being created during #each_object etc.).

The problem here is "strongly reachable". During ObjectSpace processing, the last strong reference to an object may go away and the garbage collector may run. Should ObjectSpace prevent GC from running if it's traversed and now references that object? If not, how should it be handled if immediately before you return an object from each_object, it gets garbage collected? There's no way to catch that, so each_object may end up returning a reference to an object that's gone away, or reconstituting an object whose finalization has already fired. Bad things happen.

ObjectSpace is just not compatible with any GC that requires the ability to move objects around in memory, run in parallel, and so on. It can *never* be deterministic unless it can "stop the world", so it should not be used for algorithms that require any level of determinism, such as the test search in test/unit.

- Charlie

Ken Bloom wrote:

I don't think they're making ObjectSpace go away. Just ObjectSpace#each_object.

Correct.

drb: This could be a big deal.
weakref: You've stated that Java has its own implementation.

This uses _id2ref, which doesn't appear to be going away.

Not that I wouldn't like it to :slight_smile:

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.
finalize: Did anyone even know about this? Does anyone use it?
tempfile: Meh, I'm guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

Finalizers could be implemented using Java's finalize() method for classes that need it. This method of implementing finalizers could probably be compatibly exposed using ObjectSpace.

Correct; we do support finalizers already. They weren't actually that hard to support, since as you say Java already supports finalization.

shell: This could be a problem.

This looks broken anyway since it uses fork.

Ahh yes, fork is a killer. We will never, ever support fork.

singleton: Ditto.

one is in documentation comment, giving an example of a specific behavior of the library. the other is in the same example, included executably after an if __FILE__ == $0 condition. So no actual problem here.

Whew, that's good to hear. I know singleton is used a bit in Rails, and most people run JRuby on Rails with ObjectSpace disabled...so this seems to fit with your findings.

irb: You've got jirb.

jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you pointed ou is class (Module) iteration, used for completion, but it's more general iteration than test/unit uses, and #inherited techniques that can be used for test/unit may not work here.

inherited, perhaps not. But a JRuby-internal weak list of Module instances would put this one to rest.

- Charlie

···

On Sun, 28 Oct 2007 22:13:38 +0900, Daniel Berger wrote:

Charles Oliver Nutter wrote:

Daniel Berger wrote:

irb: You've got jirb.
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|

This could still be supported through a similar mechanism as
each_object(Class), by keeping a weak hash of all Module instances.

shell: This could be a problem.
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do

I'd be surprised if shell worked 100% correctly right now anyway, due to
process-control requirements we can't support well on JVM. But I would
also expect this use of each_object to have a "better" implementation,
and if not it could again be a specific-purpose weak hash for IO streams
(which we almost have already since we want to be able to clean them up
on exit.

Speaking of multiple cases of possible class-specific instance
tracking... isn't it possible to register your interest in some such
class at some point explicitely from program code - and then any class
could be made enumerable.

mortee

Bill Kelly wrote:

I think I've used _id2ref exactly twice. I can't recall the first
usage; I don't think it made it into production code. The most
recent use was to store some ruby object id's in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.

(I suppose DRb might do something similar?)

Yeah, sounds like that's mostly a "poor man's remote hash". I'd expect that just creating a hash specifically for that purpose and passing a key around would be a "better" way to do it.

_id2ref is just another one of those features that gets rarely used, and whose use cases can often be implemented in "better" ways.

- Charlie

Robert Klemme wrote:

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM has to keep track of instances anyway and implementing this in Java via WeakReferences seems to duplicate functionality that is already there. Did you consider using "Java Virtual Machine Tools Interface"?

Java SE 6 Release Notes

You could either follow the same approach of the heapTracker presented on that page and use a flag or require a lib that enables ObjectSpace (because of the overhead of instrumentation).

You just hit on exactly why we don't use JVMTI for ObjectSpace. It would certainly work, but it would add a lot of overhead we'd never expect people to accept in a real application. Plus, it would track far more object instances than we actually want tracked.

Why is that? I mean, you could selectively decide which instances to track.

We'd love to include a JVMTI-based ObjectSpace implementation, however...it just hasn't been a high priority to implement since 99% of users never actually need ObjectSpace.

Alternatively there may be another method that does not need instrumentation and that can give you access to every (reachable) object in the JVM.

If there is...we haven't found it. The "linked weakref list" has been the least overhead so far, and it's still a lot of overhead.

Hmm, but there are iteration methods like #each_object:
http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#Heap

Did you put them down because of the "stop the world" approach? I'd say that would be ok - at least it's better than not having ObjectSpace. And also, there would be no overhead. Question is only whether it's ok to invoke arbitrary byte code (which would happen during the iteration callback).

Your idea has come up in the past, and it would probably eliminate the cost of an ObjectSpace list. However that doesn't appear to be where we pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

- Constructing an extra object for every Ruby object...namely, the WeakReference object to point to it. So we pay a memory/allocation/initialization cost.
- WeakReference itself causes Java's GC to have to do additional checks, so it can notify the WeakReference that the object it points at has gone away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some implementations of ObjectSpace-like behavior, so it's perfectly valid. But again, there's certain aspects of ObjectSpace that are just problematic...

- threading or concurrency of any kind? No, you can't have multithreading with ObjectSpace, nor a concurrent/parallel GC (and it potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace doesn't have to be deterministic"...but when it starts getting wired into libraries like test/unit, it seems like people expect it to be. If we can say OS isn't deterministic, then *nobody* should be relying in its contents for core libraries, and we could reasonably claim that each_object will never return *anything*.

I'd reformulate the requirement here: ObjectSpace.each_object must yield every object that was existent before the invocation and that is strongly reachable. I believe for the typical use case (e.g. traversing all class instances) this is enough while leaving enough flexibility for the implementation (i.e. create s snapshot of some form, iterate through some internal structure that may change due to new objects being created during #each_object etc.).

The problem here is "strongly reachable". During ObjectSpace processing, the last strong reference to an object may go away and the garbage collector may run. Should ObjectSpace prevent GC from running if it's traversed and now references that object? If not, how should it be handled if immediately before you return an object from each_object, it gets garbage collected?

You are right: objects can "disappear" (i.e. loose their strong reachability) during traversal. Obviously my suggested requirement was still too strong.

There's no way to catch that, so each_object may end up returning a reference to an object that's gone away, or reconstituting an object whose finalization has already fired. Bad things happen.

Recreation is a bad idea. I agree, objects that are no longer strongly reachable at the moment they are about to be passed to the block should *not* be passed.

ObjectSpace is just not compatible with any GC that requires the ability to move objects around in memory,

I don't think that moving is an issue. If it were, JVM's would not work the way they do (object references are no pointers to memory locations). In other words, all programs would have the same problems #each_object had.

run in parallel, and so on. It can *never* be deterministic unless it can "stop the world", so it should not be used for algorithms that require any level of determinism, such as the test search in test/unit.

Right you are. #each_object should not be used in regular code - it's more for ad hoc statistics ("how many instances of a class?") and the like.

Kind regards

  robert

···

On 28.10.2007 17:19, Charles Oliver Nutter wrote:

<snip>

ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory, run in parallel, and so on. It can
*never* be deterministic unless it can "stop the world", so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.

This is the exact reason we haven't yet implemented each_object in
Rubinius yet.

Having a generational GC that moves objects, iterating over all
objects is very,
very non-deterministic unless the GC is totally turned off while
objects are walked.

Thats at least an option we have that we may roll with for the initial
release, but
it's less than ideal.

I think of each_object as very much a MRI implementation feature that
the rest of us
implementors struggle to implement. Because of this, the community and
core members of
each implementation need to really beginning discussing whether or not
each_object is a
Ruby feature or an MRI feature.

- Evan

···

On Oct 28, 9:19 am, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

mortee wrote:

Charles Oliver Nutter wrote:

Daniel Berger wrote:

irb: You've got jirb.
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|

This could still be supported through a similar mechanism as
each_object(Class), by keeping a weak hash of all Module instances.

shell: This could be a problem.
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do

I'd be surprised if shell worked 100% correctly right now anyway, due to
process-control requirements we can't support well on JVM. But I would
also expect this use of each_object to have a "better" implementation,
and if not it could again be a specific-purpose weak hash for IO streams
(which we almost have already since we want to be able to clean them up
on exit.

Speaking of multiple cases of possible class-specific instance
tracking... isn't it possible to register your interest in some such
class at some point explicitely from program code - and then any class
could be made enumerable.

Yes, that is possible...but it solves only part of the problem. Just having ObjectSpace.each_object enableable through a flag allows it to be fully functional when you want it and out of the way the rest of the time.

- Charlie