Why Ruby Uses Mark-and-Sweep GC?

Hi,

As I am developing my code in Ruby/C, I cannot help to ask the question,
why does Ruby use the mark-and-sweep gc, instead of reference counting
such as that used by Python?

For the majority of Ruby codes, probably the kind of gc does not matter
(just as when I first coded in Ruby), but now when I started to code in C,
it seems that the mark-and-sweep gc can really get in the way.

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc. However, now the problems
with mark-and-sweep gc for my specific applications are:

  1. It is linear (O(N)) in the number of Ruby objects, and therefore it
    seems that it will not scale well. Yes, probably for the majority of Ruby
    applications, the cost of gc invocation is small as compared to all the
    other functions in the Ruby code, i.e.,

    cost = eps * N

where eps is a small number. However, no matter how small eps is, as N
grows large, eventually the gc cost cannot be neglected.

  1. When the Ruby objects are static, gc invocation is a waste.

Regarding 2), I think I can solve it by manually disabling and invoking
gc. However, I still cannot solve 1). Therefore,

"Is there a way to manually and explicitly delete a Ruby object (from

C), just like Python’s delete function, so that the sweep (and mark) phase
needs not be done?"

I never had a formal training on the subject of gc, and therefore I don’t
know whether my question above is simply impossible because of the
fundamental natures of mark-and-sweep gc.

Regards,

Bill

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc.

That’s one problem. Reference counting is also more expensive over
the long run than more automatic methods. Also more error-prone.

However, now the problems
with mark-and-sweep gc for my specific applications are:

  1. It is linear (O(N)) in the number of Ruby objects, and therefore it
    seems that it will not scale well. Yes, probably for the majority of Ruby
    applications, the cost of gc invocation is small as compared to all the
    other functions in the Ruby code, i.e.,

    cost = eps * N

where eps is a small number. However, no matter how small eps is, as N
grows large, eventually the gc cost cannot be neglected.

While mark&sweep (and other forms of tracing GC) are O(N) on the
number of objects that exist when a sweep is done, reference counting
is O(N) on the number of objects that exist ever in the program.
M&S only looks at live objects.

Reference counting also tends to make poorer use of the cache. L1 and
L2 cache effects are subtle and often a tad counter-intuitive.
Basically with refcounting you use more L1 cache in mainline program
execution than with other mechanisms. While tracing GCs do need
storage to work, just as refcount schemes to, that storage is only
used when a sweep is being done, so it doesn’t dirty the cache during
normal execution and can be densely packed for good cache locality
during a sweep.

  1. When the Ruby objects are static, gc invocation is a waste.

That’s why you don’t normally trace your constant pools. :slight_smile:

I never had a formal training on the subject of gc, and therefore I don’t
know whether my question above is simply impossible because of the
fundamental natures of mark-and-sweep gc.

I’d recommend getting the book Garbage Collection by Jones and
Lins. It goes into detail on a lot of this stuff. A run through
CiteSeer might also be in order.

···

At 1:18 AM +0900 8/24/02, William Djaja Tjokroaminata wrote:

                                     Dan

--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

I’m curious how (or if) you’ve evaluated this empirically. I’ve been
doing some IR stuff with very large numbers of objects (though I don’t
know how many offhand) and the cost of gc is (subjectively) not
affecting what I’m doing at all. Granted, this isn’t an application
with any kind of interface that operates concurrently with processing.

In my experience O(n) algorithms are thought of as a class of algorithms
that do scale fairly well.

-kyle

···

On Sat, Aug 24, 2002 at 01:18:50AM +0900, William Djaja Tjokroaminata wrote:

  1. It is linear (O(N)) in the number of Ruby objects, and therefore it
    seems that it will not scale well. Yes, probably for the majority of Ruby
    applications, the cost of gc invocation is small as compared to all the
    other functions in the Ruby code, i.e.,

    cost = eps * N

where eps is a small number. However, no matter how small eps is, as N
grows large, eventually the gc cost cannot be neglected.


moved

Keeping time, time, time,
In a sort of Runic rhyme
(Poe)

Hi,

As I am developing my code in Ruby/C, I cannot help to ask the question,
why does Ruby use the mark-and-sweep gc, instead of reference counting
such as that used by Python?

A) Reference counting breaks.
B) Even Python doesn’t use ref counting any more
C) Ref counting can be even slower in circumstances, depending on memory
allocation
D) Ref counting wouldn’t work in even a single large Ruby project I’ve
done, without me having to do metric shitloads of extra work (i.e.,
clearing references and such, to avoid ref loops)

For the majority of Ruby codes, probably the kind of gc does not matter
(just as when I first coded in Ruby), but now when I started to code in C,
it seems that the mark-and-sweep gc can really get in the way.

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc. However, now the problems
with mark-and-sweep gc for my specific applications are:

  1. It is linear (O(N)) in the number of Ruby objects, and therefore it
    seems that it will not scale well. Yes, probably for the majority of Ruby
    applications, the cost of gc invocation is small as compared to all the
    other functions in the Ruby code, i.e.,

    cost = eps * N

where eps is a small number. However, no matter how small eps is, as N
grows large, eventually the gc cost cannot be neglected.

Admittedly, mark and sweep can suck. There are much better GC methods
out there. Dunno, maybe future Ruby will use one (I’m not authoritive
on this at all).

  1. When the Ruby objects are static, gc invocation is a waste.

Regarding 2), I think I can solve it by manually disabling and invoking
gc. However, I still cannot solve 1). Therefore,

"Is there a way to manually and explicitly delete a Ruby object (from

C), just like Python’s delete function, so that the sweep (and mark) phase
needs not be done?"

This would break. Bad. Lots of internal objects would never get
cleared, either. There is more to the language than just what you
allocate/free.

I never had a formal training on the subject of gc, and therefore I don’t
know whether my question above is simply impossible because of the
fundamental natures of mark-and-sweep gc.

Bah. You don’t need formal training. Just do some Google searches,
play with the Ruby source, etc. I’m written several memory manages for
C/C++ projects, and let me tell you - even a poorly implemented GC
system is easier and less stressful than the best of reference counting
systems. A good GC is pure bliss. ~,^

···

On Fri, 2002-08-23 at 12:18, William Djaja Tjokroaminata wrote:

Regards,

Bill

Dan Sugalski dan@sidhe.org writes:

···

At 1:18 AM +0900 8/24/02, William Djaja Tjokroaminata wrote:

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc.

That’s one problem. Reference counting is also more expensive over
the long run than more automatic methods. Also more error-prone.

There are, of course, better gc’s than mark and sweep – compacting
collectors and generational compacting collectors come to
mind. Writing an implementation that deals with a compacting collector
can be a royal pain though – you never know when an object will get
moved out from under you.

Perry

Thanks for all the responses. So I guess we are already dealing with a
pretty reasonable gc and I just have to make effort to get around it.

I have just one thing that I would like to get clarified. Suppose the
code is as simple as

loop (in C):
    create a (simple) Ruby object
    interact with the user through this Ruby object
    when the user method returns, delete this Ruby object

If I try to modify the sweep part by deleting this Ruby object manually,
will it likely to break?

Regards,

Bill

···

============================================================================
Sean Middleditch elanthis@awesomeplay.com wrote:

"Is there a way to manually and explicitly delete a Ruby object (from

C), just like Python’s delete function, so that the sweep (and mark) phase
needs not be done?"

This would break. Bad. Lots of internal objects would never get
cleared, either. There is more to the language than just what you
allocate/free.

Sean Middleditch wrote:

B) Even Python doesn’t use ref counting any more

Sure it does. Python 2.0 did, however, add a garbage collector that
tries to detect cycles (see Garbage Collection for Python).

Believe me, I’m well aware of that pain. :frowning:

When dealing with external libraries, a pure copying collector’s not
an option–Oracle, for example, would really hate it if you screwed
around in its internals changing references to things.

The practical alternative is to have moveable and nonmoveable parts
of objects, and a mechanism to pin objects so they temporarily don’t
move. It’s what we’ve done with Parrot. Not the only solution, but an
adequate one under the circumstances.

···

At 3:18 AM +0900 8/24/02, Perry E. Metzger wrote:

Dan Sugalski dan@sidhe.org writes:

At 1:18 AM +0900 8/24/02, William Djaja Tjokroaminata wrote:

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc.

That’s one problem. Reference counting is also more expensive over
the long run than more automatic methods. Also more error-prone.

There are, of course, better gc’s than mark and sweep – compacting
collectors and generational compacting collectors come to
mind. Writing an implementation that deals with a compacting collector
can be a royal pain though – you never know when an object will get
moved out from under you.


Dan

--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

It certainly used to be the belief that moving collectors are better than
mark and sweep although I’m not sure if this viewpoint is generally holding
now. I read somewhere that a number of studies have shown that moving
collectors can have worse virtual memory and caching behaviour.

I’ve decided to stick with mark and sweep because I don’t like the idea of
copying lots of memory about for reasonable sized objects. Generations help
by reducing the amount of live data to scan but there’s no reason why mark
and sweep can’t be generational too. In fact, this is the approach I’ve
taken. A bit of work is needed to avoid fragmentation but it’s not been too
difficult.

···


Justin Johnson

“Perry E. Metzger” perry@piermont.com wrote in message
news:874rdlxxi5.fsf@snark.piermont.com

Dan Sugalski dan@sidhe.org writes:

At 1:18 AM +0900 8/24/02, William Djaja Tjokroaminata wrote:

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc.

That’s one problem. Reference counting is also more expensive over
the long run than more automatic methods. Also more error-prone.

There are, of course, better gc’s than mark and sweep – compacting
collectors and generational compacting collectors come to
mind. Writing an implementation that deals with a compacting collector
can be a royal pain though – you never know when an object will get
moved out from under you.

Perry

Sean Middleditch wrote:

B) Even Python doesn’t use ref counting any more

Sure it does. Python 2.0 did, however, add a garbage collector that
tries to detect cycles (see Garbage Collection for Python).

Ah. “My bad.” I had thought Python 2 switched over to a full GC. (I
don’t keep up on Python anymore.)

···

On Fri, 2002-08-23 at 15:19, Lyle Johnson wrote:

Hi,

Probably I should ask this question instead, to people who are really
familiar with the internals of Ruby object and Ruby gc:

Will it be tricky to try to separate Ruby objects that will be involved in
the mark-and-sweep and those that won’t (from C)?

For example, the simple function “rb_global_variable” registers the VALUE
address that will be marked (and therefore will not be swept). Is it
possible to create a perfectly valid Ruby object (VALUE) that will not be
swept? In other words, the Ruby gc will not be aware of the existence of
such objects.

From theoretical point of view, a Ruby object (VALUE) is just a model of
an object. How it is gc’ed should be independent from this model. But by
just scanning functions such as “rb_newobj”, it is unclear whether we can
modify VALUE so that it will not participate in the sweep phase. Probably
there are higher-order effects that will make simple tweaking impossible?

Regards,

Bill

Another way to do this would be:

loop do
Foo.allocate do |foo|
interact_with_user(foo)
end
end

Foo.allocate creates the foo object and yields it to the block. When
the block exits, it breaks-down but does not destroy the object. For
example, if you pass a block to File.open, then the file gets closed
when the block exits. If you try to pass the file object out of the
block and then use it, then you will get an exception, because you are
trying to read or write to a closed file.

In your case, you may want to simply deallocate memory when the block
exits, and then mark the object as dead. You can use Object#extend as a
cheap way to disable methods on the object that might be dangerous if
called on a half-dead object.

This has the added benefit of being exception-safe.

Paul

···

On Sat, Aug 24, 2002 at 04:19:06AM +0900, William Djaja Tjokroaminata wrote:

Thanks for all the responses. So I guess we are already dealing with a
pretty reasonable gc and I just have to make effort to get around it.

I have just one thing that I would like to get clarified. Suppose the
code is as simple as

loop (in C):
    create a (simple) Ruby object
    interact with the user through this Ruby object
    when the user method returns, delete this Ruby object

If I try to modify the sweep part by deleting this Ruby object manually,
will it likely to break?

There’s however a problem with bringing generational GC to Ruby. If your
generations are G0 (newer), G1, …, when marking objects in G0 you have to go
through “root variables”· (frames, globals…) AND the objects in G1,
…, Gn which point to things in G0. For Ruby to be able to do that,
there would have to be a function to register an object of generation Gn
as pointing to another of generation Gi where i < n. That way it would
be able to do the marking only with the roots and the “updated” objects in
Gi+1,…,Gn. Otherwise you end up scanning G0, …, Gn, and generational
GC buys you nothing.

Ruby could take care of this automagically if the code that makes an
object point to a newer one is done in Ruby, but we’d have to do it if
it is in C. So making Ruby’s GC generational involves changing stuff in
more places than gc.c, plus breaking most of the extensions :expressionless:

···

On Sat, Aug 24, 2002 at 04:19:08AM +0900, Justin Johnson wrote:

It certainly used to be the belief that moving collectors are better than
mark and sweep although I’m not sure if this viewpoint is generally holding
now. I read somewhere that a number of studies have shown that moving
collectors can have worse virtual memory and caching behaviour.

I’ve decided to stick with mark and sweep because I don’t like the idea of
copying lots of memory about for reasonable sized objects. Generations help
by reducing the amount of live data to scan but there’s no reason why mark
and sweep can’t be generational too. In fact, this is the approach I’ve
taken. A bit of work is needed to avoid fragmentation but it’s not been too
difficult.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Windows without the X is like making love without a partner.
– MaDsen Wikholm, mwikholm@at8.abo.fi

By using rb_global_variable (which simply calls
rb_gc_register_address(&value)), the VALUE is always marked, but the
gc is aware of its existence. It simply won’t be swept until you do
rb_gc_unregister_address() it. But the marking phase isn’t any faster.
It’s still O(R) :expressionless:

···

On Sat, Aug 24, 2002 at 04:59:15AM +0900, William Djaja Tjokroaminata wrote:

Hi,

Probably I should ask this question instead, to people who are really
familiar with the internals of Ruby object and Ruby gc:

Will it be tricky to try to separate Ruby objects that will be involved in
the mark-and-sweep and those that won’t (from C)?

For example, the simple function “rb_global_variable” registers the VALUE
address that will be marked (and therefore will not be swept). Is it
possible to create a perfectly valid Ruby object (VALUE) that will not be
swept? In other words, the Ruby gc will not be aware of the existence of
such objects.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

<|ryan|> I don’t use deb
u poor man
netgod: heh
apt-get install task-p0rn

Is Parrot’s convoluted copying GC generational?

···

On Sat, Aug 24, 2002 at 03:27:58AM +0900, Dan Sugalski wrote:

At 3:18 AM +0900 8/24/02, Perry E. Metzger wrote:

Dan Sugalski dan@sidhe.org writes:

At 1:18 AM +0900 8/24/02, William Djaja Tjokroaminata wrote:

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc.

That’s one problem. Reference counting is also more expensive over
the long run than more automatic methods. Also more error-prone.

There are, of course, better gc’s than mark and sweep – compacting
collectors and generational compacting collectors come to
mind. Writing an implementation that deals with a compacting collector
can be a royal pain though – you never know when an object will get
moved out from under you.

Believe me, I’m well aware of that pain. :frowning:

When dealing with external libraries, a pure copying collector’s not
an option–Oracle, for example, would really hate it if you screwed
around in its internals changing references to things.

The practical alternative is to have moveable and nonmoveable parts
of objects, and a mechanism to pin objects so they temporarily don’t
move. It’s what we’ve done with Parrot. Not the only solution, but an
adequate one under the circumstances.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

The only “intuitive” interface is the nipple. After that, it’s all learned.
– Bruce Ediger, bediger@teal.csn.org, on X interfaces

The GC’s not particularly convoluted. At the moment, though, it’s not
generational. That may change in the future.

···

At 9:40 AM +0900 8/24/02, Mauricio Fernández wrote:

On Sat, Aug 24, 2002 at 03:27:58AM +0900, Dan Sugalski wrote:

At 3:18 AM +0900 8/24/02, Perry E. Metzger wrote:

Dan Sugalski dan@sidhe.org writes:

At 1:18 AM +0900 8/24/02, William Djaja Tjokroaminata wrote:

I think one of the problems with reference counting is the circular
reference, which cannot be handled by the gc.

That’s one problem. Reference counting is also more expensive over
the long run than more automatic methods. Also more error-prone.

There are, of course, better gc’s than mark and sweep – compacting
collectors and generational compacting collectors come to
mind. Writing an implementation that deals with a compacting collector
can be a royal pain though – you never know when an object will get
moved out from under you.

Believe me, I’m well aware of that pain. :frowning:

When dealing with external libraries, a pure copying collector’s not
an option–Oracle, for example, would really hate it if you screwed
around in its internals changing references to things.

The practical alternative is to have moveable and nonmoveable parts
of objects, and a mechanism to pin objects so they temporarily don’t
move. It’s what we’ve done with Parrot. Not the only solution, but an
adequate one under the circumstances.

Is Parrot’s convoluted copying GC generational?


Dan

--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

Mauricio Fernández batsman.geo@yahoo.com writes:

There’s however a problem with bringing generational GC to Ruby. If your
generations are G0 (newer), G1, …, when marking objects in G0 you
have to go through “root variables”· (frames, globals…) AND the
objects in G1, …, Gn which point to things in G0.

This is no different from the generational GC problem in most
languages. There are several good monographs on garbage collection out
there that explain how one overcomes such problems in practice.

…pm

···


Perry E. Metzger perry@piermont.com

“Ask not what your country can force other people to do for you…”

Hi,

Thanks for the response. Yes, after examining the codes further, it seems
that rb_gc_register_address() and rb_gc_unregister_address() will not
accomplish what I want. Using rb_gc_unregister_address() will only result
in the object being not-marked, but not being not-swept.

It seems that a valid VALUE can only be obtained from “freelist”, which is
obtained from the “heaps”. I guess this has something to do with
“31-bit” model of Ruby objects (which I guess was designed to work
efficiently with Fixnum, Symbol, true, false, and nil). If VALUE is a
pure pointer, then I guess there will be no this kind of limitation.

Now, if only I can create a second “heaps” whose VALUE’s will not be
swept…

Regards,

Bill

···

============================================================================
Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

By using rb_global_variable (which simply calls
rb_gc_register_address(&value)), the VALUE is always marked, but the
gc is aware of its existence. It simply won’t be swept until you do
rb_gc_unregister_address() it. But the marking phase isn’t any faster.
It’s still O(R) :expressionless:

But having an automatically managed remembered list (or equivalent
method) only works for code written in Ruby. The programmer must update
it manually in modules in C, which is the issue I was talking about:
the breakage of most extensions…

···

On Sat, Aug 24, 2002 at 12:39:54PM +0900, Perry E. Metzger wrote:

Mauricio Fernández batsman.geo@yahoo.com writes:

There’s however a problem with bringing generational GC to Ruby. If your
generations are G0 (newer), G1, …, when marking objects in G0 you
have to go through “root variables”· (frames, globals…) AND the
objects in G1, …, Gn which point to things in G0.

This is no different from the generational GC problem in most
languages. There are several good monographs on garbage collection out
there that explain how one overcomes such problems in practice.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

If you really want pure ASCII, save it as text… or browse
it with your favorite browser…
– Alexandre Maret amaret@infomaniak.ch

It seems that a valid VALUE can only be obtained from "freelist", which is

rb_gc_force_recycle()

but be *VERY* carefull with this

Guy Decoux