JRuby disabling ObjectSpace: what implications?

Robert Klemme wrote:

You just hit on exactly why we don't use JVMTI for ObjectSpace. It would certainly work, but it would add a lot of overhead we'd never expect people to accept in a real application. Plus, it would track far more object instances than we actually want tracked.

Why is that? I mean, you could selectively decide which instances to track.

Actually, we do that a bit already. For example, we do not track arrays constructed during argument processing, since they are typically transient. The problem is that we could only choose to track all Ruby objects, for example...which would cripple other JRuby apps running in the same process.

In general, though, we haven't explored JVMTI because we want JRuby to be the best production environment for deploying apps, and nobody will EVER turn on JVMTI on their production servers.

Alternatively there may be another method that does not need instrumentation and that can give you access to every (reachable) object in the JVM.

If there is...we haven't found it. The "linked weakref list" has been the least overhead so far, and it's still a lot of overhead.

Hmm, but there are iteration methods like #each_object:
JVM(TM) Tool Interface 1.0.38

I was referring to non-JVMTI solutions, but you're right, JVMTI does provide this capability.

Did you put them down because of the "stop the world" approach? I'd say that would be ok - at least it's better than not having ObjectSpace. And also, there would be no overhead. Question is only whether it's ok to invoke arbitrary byte code (which would happen during the iteration callback).

Is it really ok? You need to remember that JRuby opens up the possibility of running many, many applications in the same process, as well as asynchronous algorithms with true parallel threads. We can't expect people to cripple all that so they can walk EVERY object in the system. "Stop the world" is awful when you start breaking the ability to do many things in parallel, as you can in JRuby.

But it may be that for cases where each_object is needed, this is a reasonable thing to do. I think if someone were to submit an implementation of each_object that uses JVMTI, we would certainly accept it :slight_smile:

ObjectSpace is just not compatible with any GC that requires the ability to move objects around in memory,

I don't think that moving is an issue. If it were, JVM's would not work the way they do (object references are no pointers to memory locations). In other words, all programs would have the same problems #each_object had.

The problem is not so much that the object references move as that you would have to lock the memory locations for some period of time to be able to walk the object table. And I think that's *bad* especially when we're looking at JRuby allowing folks to run dozens of apps in the same process and memory space out of the box. We can't lock things down like that.

- Charlie

···

On 28.10.2007 17:19, Charles Oliver Nutter wrote:

evanwebb@gmail.com wrote:

I think of each_object as very much a MRI implementation feature that
the rest of us
implementors struggle to implement. Because of this, the community and
core members of
each implementation need to really beginning discussing whether or not
each_object is a
Ruby feature or an MRI feature.

That's actually a really good point. each_object is more a feature of an individual implementation's memory model than a general feature that can be applied to every Ruby implementation. In many cases, like ours, you simply don't have control over that memory model enough to provide a real each_object implementation (and _id2ref requires tricks too, but it's at least bounded and explicit). So it may be fair to say that each_object is an MRI feature we emulate, but cannot simulate well enough for it to translate appropriately.

Charles Oliver Nutter wrote:

Actually, we do that a bit already. For example, we do not track arrays
constructed during argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example...which would cripple other JRuby apps running in
the same process.

[...]

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that's *bad* especially when
we're looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can't lock things down like
that.

Sorry for the extremely uninitiated and naive question - but when you're
about to enumerate each object in an application, aren't you interested
only in this application's objects anyway? So why would you have to lock
anything about the other ruby apps in the same process? Is that kind of
distinguishing objects impossible on the GC/enumeration level?

mortee

Robert Klemme wrote:
>> You just hit on exactly why we don't use JVMTI for ObjectSpace. It
>> would certainly work, but it would add a lot of overhead we'd never
>> expect people to accept in a real application. Plus, it would track
>> far more object instances than we actually want tracked.
>
> Why is that? I mean, you could selectively decide which instances to
> track.

In general, though, we haven't explored JVMTI because we want JRuby to
be the best production environment for deploying apps, and nobody will
EVER turn on JVMTI on their production servers.

Well, it depends on the overhead and on the invocation model. I
assumed you would be starting a JVM per process but your other remarks
sound more like there is one JVM for JRuby programs...

> Did you put them down because of the "stop the world" approach? I'd say
> that would be ok - at least it's better than not having ObjectSpace. And
> also, there would be no overhead. Question is only whether it's ok to
> invoke arbitrary byte code (which would happen during the iteration
> callback).

Is it really ok? You need to remember that JRuby opens up the
possibility of running many, many applications in the same process, as
well as asynchronous algorithms with true parallel threads. We can't
expect people to cripple all that so they can walk EVERY object in the
system. "Stop the world" is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.

Ok, I see I need to dive further into JRuby before I discuss this further. :slight_smile:

But it may be that for cases where each_object is needed, this is a
reasonable thing to do. I think if someone were to submit an
implementation of each_object that uses JVMTI, we would certainly accept
it :slight_smile:

Hint, hint... :slight_smile:

>> ObjectSpace is just not compatible with any GC that requires the
>> ability to move objects around in memory,
>
> I don't think that moving is an issue. If it were, JVM's would not work
> the way they do (object references are no pointers to memory locations).
> In other words, all programs would have the same problems #each_object
> had.

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that's *bad* especially when
we're looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can't lock things down like
that.

I don't understand this remark of yours. If you implement this in Java
land (as you did apparently with WeakReferences) then there is no need
to lock anything. You just traverse the list (or a copy of the list)
and if a ref has been set to null you do not pass it to the callback.

If it is some kind of native code (possibly via JNI or other
interfaces) probably more care has to be taken, although I'd assume
that JNI takes care of this (i.e. once the callback is invoked with a
non null argument the object stays life until after the callback
returns unless you clear that reference of course).

Traversal during #each_object in that respect is similar to traversal
through an ordinary collection - during that a GC can occur just the
same but that does not affect the traversal in any way.

What am I missing?

Kind regards

robert

···

2007/10/28, Charles Oliver Nutter <charles.nutter@sun.com>:

> On 28.10.2007 17:19, Charles Oliver Nutter wrote:

Robert Klemme wrote:
>> You just hit on exactly why we don't use JVMTI for ObjectSpace. It
>> would certainly work, but it would add a lot of overhead we'd never
>> expect people to accept in a real application. Plus, it would track
>> far more object instances than we actually want tracked.

> Why is that? I mean, you could selectively decide which instances to
> track.

Actually, we do that a bit already. For example, we do not track arrays
constructed during argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example...which would cripple other JRuby apps running in
the same process.

In general, though, we haven't explored JVMTI because we want JRuby to
be the best production environment for deploying apps, and nobody will
EVER turn on JVMTI on their production servers.

>>> Alternatively there may be another method that does not need
>>> instrumentation and that can give you access to every (reachable)
>>> object in the JVM.

>> If there is...we haven't found it. The "linked weakref list" has been
>> the least overhead so far, and it's still a lot of overhead.

> Hmm, but there are iteration methods like #each_object:
>JVM(TM) Tool Interface 1.0.38

I was referring to non-JVMTI solutions, but you're right, JVMTI does
provide this capability.

> Did you put them down because of the "stop the world" approach? I'd say
> that would be ok - at least it's better than not having ObjectSpace. And
> also, there would be no overhead. Question is only whether it's ok to
> invoke arbitrary byte code (which would happen during the iteration
> callback).

Is it really ok? You need to remember that JRuby opens up the
possibility of running many, many applications in the same process, as
well as asynchronous algorithms with true parallel threads. We can't
expect people to cripple all that so they can walk EVERY object in the
system. "Stop the world" is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.

But it may be that for cases where each_object is needed, this is a
reasonable thing to do.

Exactly. I think that each_object rarely has to go into production
code, but is very handy (and, to be honest, just fun, really) in
debugging/testing/experimenting. For those type situations, I don't
really think a "stop the world" approach is so terrible. I find it
less of a disturbance than having this off-code switch.

I think if someone were to submit an

···

On Oct 28, 3:39 pm, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

> On 28.10.2007 17:19, Charles Oliver Nutter wrote:
implementation of each_object that uses JVMTI, we would certainly accept
it :slight_smile:

>> ObjectSpace is just not compatible with any GC that requires the
>> ability to move objects around in memory,

> I don't think that moving is an issue. If it were, JVM's would not work
> the way they do (object references are no pointers to memory locations).
> In other words, all programs would have the same problems #each_object
> had.

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that's *bad* especially when
we're looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can't lock things down like
that.

- Charlie

mortee wrote:

Charles Oliver Nutter wrote:

Actually, we do that a bit already. For example, we do not track arrays
constructed during argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example...which would cripple other JRuby apps running in
the same process.

[...]

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that's *bad* especially when
we're looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can't lock things down like
that.

Sorry for the extremely uninitiated and naive question - but when you're
about to enumerate each object in an application, aren't you interested
only in this application's objects anyway? So why would you have to lock
anything about the other ruby apps in the same process? Is that kind of
distinguishing objects impossible on the GC/enumeration level?

As far as I know there's no way to have JVMTI enumerate only objects created by a specific application in a given JVM. So any sort of ObjectSpace impl based on it would have to take that into consideration.

- Charlie

Hm, if you host different applications in the same JVM you probably
need separate class loaders anyway to separate changes on classes.
Maybe you can use that to partition the heap. Alternatively you could
use IterateOverObjectsReachableFromObject() and start from main. Just
a few wild guesses.

Btw, but the issue with stopping the world would still not go away.
Too bad. A possible solution would be to implement the callback in a
way that it places all references in a Java collection. Only after it
finishes the Ruby land callback is invoked for each instance. The
downside is that you need more space (i.e. for the collection which
could become largish) but on the plus side is that you do not have any
overhead (other than incurred by JVMTI) during "normal" operation and
you can limit the stop the world time to just the copying phase which
might be acceptable. Charles, what do you think?

Kind regards

robert

···

2007/10/29, Charles Oliver Nutter <charles.nutter@sun.com>:

mortee wrote:
> Charles Oliver Nutter wrote:
>> Actually, we do that a bit already. For example, we do not track arrays
>> constructed during argument processing, since they are typically
>> transient. The problem is that we could only choose to track all Ruby
>> objects, for example...which would cripple other JRuby apps running in
>> the same process.
>
> [...]
>
>> The problem is not so much that the object references move as that you
>> would have to lock the memory locations for some period of time to be
>> able to walk the object table. And I think that's *bad* especially when
>> we're looking at JRuby allowing folks to run dozens of apps in the same
>> process and memory space out of the box. We can't lock things down like
>> that.
>
> Sorry for the extremely uninitiated and naive question - but when you're
> about to enumerate each object in an application, aren't you interested
> only in this application's objects anyway? So why would you have to lock
> anything about the other ruby apps in the same process? Is that kind of
> distinguishing objects impossible on the GC/enumeration level?

As far as I know there's no way to have JVMTI enumerate only objects
created by a specific application in a given JVM. So any sort of
ObjectSpace impl based on it would have to take that into consideration.

Robert Klemme wrote:

Btw, but the issue with stopping the world would still not go away.
Too bad. A possible solution would be to implement the callback in a
way that it places all references in a Java collection. Only after it
finishes the Ruby land callback is invoked for each instance. The
downside is that you need more space (i.e. for the collection which
could become largish) but on the plus side is that you do not have any
overhead (other than incurred by JVMTI) during "normal" operation and
you can limit the stop the world time to just the copying phase which
might be acceptable. Charles, what do you think?

It's certainly possible to do this, but it would probably need to create a giant strong-referenced list of objects for iteration. Part of my hard rules for implementing ObjectSpace is that it MUST NOT interfere with an object's normal lifecycle.

- Charlie