Best GC for Ruby?

Which method of garbage collection would most suit Ruby?

I’ve implemented a mark-and-sweep GC with incremental sweep so far. The
incremental sweep algorithm makes GC times less intrusive and could possibly
be added to Ruby without too much trouble.

Some features of the collector:

  1. Memory is reserved in 4k blocks aligned on a 4k boundary.
  2. Each block stores objects of a fixed size.
  3. Allocation is either sequential when filling a new block or resuses
    linked garbage fragments within the block. Either way, allocation is very
    fast.
  4. Each block stores mark bits and live bits - this information is not
    stored with the object.
  5. Valid ptr checks are really fast using a similar technique to the one in
    the Boehm-Demers-Weiser collector.
  6. Stack is used for conservative collection although unlike the
    Boehm-Demers-Weiser collector, objects know how to collect themselves.

However, to make things really fast I think I need to make my collector
generational.

Normally, generational collectors are based on copy collection in the young
generation and mark-and-sweep in the older generations. A notable exception
is the train algorithm.

I can see that having a copy collector for the young generation is a win for
functional languages such as ML/OCaml where 90%+ of the objects don’t
survive infancy and so not much is actually copied. This may not be the
case in Ruby where more objects may survive infancy and so result in
increased copying overheads.

Therefore, I’ve decided to keep the young generation mark-and-sweep.

For the old generations I like the sound of the train algorithm because of
it’s incremental and ordering nature. The main problem is that I’d like to
avoid hardware write barriers for the remembered sets.

Anybody out there with any views on this?

Matz, did you have any ideas on the GC for Rite?

···


Justin Johnson

Hi Justin,

Thanks for all the sophisticated ideas regarding the next Ruby GC.

I am still waiting for Matz’s response regarding whether it is possible to
simply use “rb_gc_un/register_address()” (or a similar concept) for GCed
objects that are referred to from non-GCed objects. If such a concept
is indeed possible, then one feature that I would like to have in Ruby GC
is a separation of GCed objects from non-GCed objects. With the existence
of non-GCed objects, then we will have tighter memory control just like
that in ordinary C programming (and this will make Ruby more ahead than
the other scripting languages, where at least in theory, the Ruby speed
can be made arbitrarily close to the C speed).

I always think that the only reason people write Ruby in C is, beside to
interface with external C libraries, to increase execution
performance. Currently, the Ruby GC is one of the performance bottlenecks
and even worse, it does not scale well. Therefore I am really glad that
there are people like you who are considering the Ruby GC very seriously.

I am always dreaming of “The speed of C, the simplicity of
Ruby”… Hopefully it can come true…

Regards,

Bill

···

=============================================================================
Justin Johnson justinj@mobiusent.com wrote:

Which method of garbage collection would most suit Ruby?

(…deleted…)

Anybody out there with any views on this?

Matz, did you have any ideas on the GC for Rite?


Justin Johnson

Justin Johnson wrote:

The main problem is that I’d like to
avoid hardware write barriers for the remembered sets.

What is it?

Thanks, Christian

Hi,

Matz, did you have any ideas on the GC for Rite?

It will be

  • conservative (to make extension programming easy)
  • generational (hopefully, to reduce GC perfomance cost)

I’m considering either

  • deferred reference count (1 bit refcount + mark and sweep)
  • generational mark and sweep

Or I might use Boehm GC for the Rite prototype.

						matz.
···

In message “Best GC for Ruby?” on 02/08/29, “Justin Johnson” justinj@mobiusent.com writes:

It sounds like all you want is the ability to register/unregister your own
roots?

Decoupling chunks of Ruby objects from the automatic GC sounds problematic
though and I’m not sure how it could be done safely.

I think people write in C because:

  1. Very popular.
  2. Fast.
  3. Well, it’s better than writing in assembler. :wink:

Although personally, I prefer C++ even though you have to be incredibly
disciplined with it and it’s flawed in a number of ways.

···


Justin Johnson

“William Djaja Tjokroaminata” billtj@y.glue.umd.edu wrote in message
news:akldus$71f$1@grapevine.wam.umd.edu…

Hi Justin,

Thanks for all the sophisticated ideas regarding the next Ruby GC.

I am still waiting for Matz’s response regarding whether it is possible to
simply use “rb_gc_un/register_address()” (or a similar concept) for GCed
objects that are referred to from non-GCed objects. If such a concept
is indeed possible, then one feature that I would like to have in Ruby GC
is a separation of GCed objects from non-GCed objects. With the existence
of non-GCed objects, then we will have tighter memory control just like
that in ordinary C programming (and this will make Ruby more ahead than
the other scripting languages, where at least in theory, the Ruby speed
can be made arbitrarily close to the C speed).

I always think that the only reason people write Ruby in C is, beside to
interface with external C libraries, to increase execution
performance. Currently, the Ruby GC is one of the performance bottlenecks
and even worse, it does not scale well. Therefore I am really glad that
there are people like you who are considering the Ruby GC very seriously.

I am always dreaming of “The speed of C, the simplicity of
Ruby”… Hopefully it can come true…

Regards,

Bill

============================================================================

Justin Johnson justinj@mobiusent.com wrote:

Which method of garbage collection would most suit Ruby?

(…deleted…)

Anybody out there with any views on this?

Matz, did you have any ideas on the GC for Rite?


Justin Johnson

A remembered set is a set of pointer locations that record which pointers
point into a generation, usually from a younger generation.

Usually in a generational system, you want to modify the remembered set when
any of these pointers are changed (written, hence write-barrier). This is
commonly done using hardware memory protection facilities provided by the
OS.

At least that’s my take on it. :slight_smile:

···


Justin Johnson

“Christian Szegedy” szegedy@nospam.or.uni-bonn.de wrote in message
news:3D6E5682.7090502@nospam.or.uni-bonn.de

Justin Johnson wrote:

The main problem is that I’d like to
avoid hardware write barriers for the remembered sets.

What is it?

Thanks, Christian

I had a good look at the Boehm GC. The only down side that I could see is
that under Windows, there is no ability to allocate 4k aligned blocks from
the system. The code allocates larger blocks and sets a pointer inside to
get 4k alignment. This could be quite wasteful with memory. On Unix based
systems it’s not a problem.

It’s good to hear that Rite’s GC will be generational. That should take
care of any performance issues related to Ruby’s GC.

For mature generations I’m seriously considering the Train algorithm because
of it’s highly incremental and unintrusive nature. The Mjolner BETA
language has an implementation and achieved good results.

Maintaining the remembered sets without a hardware write barrier is still
posing a problem. I can’t seem to find any good information on software
write barriers. There didn’t seem to be anything in the Jones and Lins
book.

Also, I’m not sure how software write barriers could be introduced without
making extension programming more error prone and difficult.

Maybe I’m missing something?

···


Justin Johnson

“Yukihiro Matsumoto” matz@ruby-lang.org wrote in message
news:1030636759.172074.21296.nullmailer@picachu.netlab.jp…

Hi,

In message “Best GC for Ruby?” > on 02/08/29, “Justin Johnson” justinj@mobiusent.com writes:

Matz, did you have any ideas on the GC for Rite?

It will be

  • conservative (to make extension programming easy)
  • generational (hopefully, to reduce GC perfomance cost)

I’m considering either

  • deferred reference count (1 bit refcount + mark and sweep)
  • generational mark and sweep

Or I might use Boehm GC for the Rite prototype.

matz.

I had a good look at the Boehm GC. The only down side that I could see is
that under Windows, there is no ability to allocate 4k aligned blocks from
the system. The code allocates larger blocks and sets a pointer inside to
get 4k alignment. This could be quite wasteful with memory. On Unix based
systems it’s not a problem.

It’s good to hear that Rite’s GC will be generational. That should take
care of any performance issues related to Ruby’s GC.

For mature generations I’m seriously considering the Train algorithm because
of it’s highly incremental and unintrusive nature. The Mjolner BETA
language has an implementation and achieved good results.

Maintaining the remembered sets without a hardware write barrier is still
posing a problem. I can’t seem to find any good information on software
write barriers. There didn’t seem to be anything in the Jones and Lins
book.

Also, I’m not sure how software write barriers could be introduced without
making extension programming more error prone and difficult.

Maybe I’m missing something?

···


Justin Johnson

“Yukihiro Matsumoto” matz@ruby-lang.org wrote in message
news:1030636759.172074.21296.nullmailer@picachu.netlab.jp…

Hi,

In message “Best GC for Ruby?” > on 02/08/29, “Justin Johnson” justinj@mobiusent.com writes:

Matz, did you have any ideas on the GC for Rite?

It will be

  • conservative (to make extension programming easy)
  • generational (hopefully, to reduce GC perfomance cost)

I’m considering either

  • deferred reference count (1 bit refcount + mark and sweep)
  • generational mark and sweep

Or I might use Boehm GC for the Rite prototype.

matz.

FYI, check out http://www.ravenbrook.com/project/mps/

The Memory Pool System is a very general, adaptable, flexible, reliable,
and efficient memory management system. It permits the flexible combination
of memory management techniques, supporting manual and automatic memory
management, in-line allocation, finalization, weakness, and multiple
concurrent co-operating incremental generational garbage collections. It
also includes a library of memory pool classes implementing specialized
memory management policies.

···

Robert Cowham

Hi,

Maintaining the remembered sets without a hardware write barrier is still
posing a problem. I can’t seem to find any good information on software
write barriers. There didn’t seem to be anything in the Jones and Lins
book.

See “7.5 Inter-generational pointers”. In p.172, “Using bytes rather
than bits speeds up the write-barrier, reducing it to just three SPARC
instructions in addition to the actual store”. So it’s not
impossible, at least in theory.

Also, I’m not sure how software write barriers could be introduced without
making extension programming more error prone and difficult.

It can be, if all attribute modifies are done by API. Though we can’t
assume this if we are developing general purpose collector like
Boehm’s.

						matz.
···

In message “Re: Best GC for Ruby?” on 02/08/30, “Justin Johnson” justinj@mobiusent.com writes:

Hi,

Just some information that I got when I was searching how other
"scripting" languages implement their garbage collector:

http://lua-users.org/wiki/GarbageCollectionInRealTimeGames

Hopefully it has some useful information to be considered for the next
Ruby gc.

Regards,

Bill

See “7.5 Inter-generational pointers”. In p.172, “Using bytes rather
than bits speeds up the write-barrier, reducing it to just three SPARC
instructions in addition to the actual store”. So it’s not
impossible, at least in theory.

Yes, I’ve read that very chapter. If setting a pointer only results in a
byte being set then maybe that’s not too bad. I’m not sure yet.

Let me elaborate by showing some code to show how my Ruby gc works:

C++ .h

class RbNewObject : public RbObject
{
public:
RUBY_CLASS( RbNewObject, RbObject )

// Garbage collection traversal
virtual void gc();

private:
// Data members…
RbObject* pObjectA_;
RbObject* pObjectB_;
};

Then in C++ .cpp

void RbNewObject::gc()
{
gcTraverse( pObjectA_ );
gcTraverse( pObjectB_ );
}

The garbage collector uses the C/C++ stack to find roots. Roots can also be
manually registered. Any object that is to be controlled by garbage
collection must be derived from RbObject (which is in turn derived from an
object called GcObject in my GC library). When GC objects are being marked,
the objects gc() method is called to trace other objects to mark.
gcTraverse() calls into the garbage collector to do bit marking. It does
not recurse but uses an iteration technique similar to Cheney’s copying.
Using the gc() method means that the garbage collector knows more about the
objects it can collect and therefore cuts down on the number of
misidentifications that the Boehm GC experiences. It should also make
marking faster. This is a little more work for the programmer but I think
it’s worth the small effort.

So far it’s easy for the application programmer to use. They can change the
pointers simply by setting them to different values. The GC identifies
valid object ptr’s from the C/C++ stack and traverses objects accordingly.

To implement a software barrier, a simple pointer set like:

pObjectA_ = pOtherObject;

would have to become:

setPointer( pObjectA_, pOtherObject );

or, at least for C++, possibly a template smart ptr style method could be
used. The definition would be something like:

wbPtr pObjectA_;

and then setting becomes the usual:

pObjectA_ = pOtherObject;

Either way, I’m a little concerned about the overhead this is going to place
on all pointer setting. I think I’d have to implement it and take a look at
the generated assembler.

It can be, if all attribute modifies are done by API. Though we can’t
assume this if we are developing general purpose collector like
Boehm’s.

Did you mean like in the setPointer macro/function above?

···


Justin Johnson

Out of curiosity, have you given this a shot to see how performance
looks? I’ve been debating the merits of generational collection, but
I’ve been worried about the performance impact (and the code hassle)
of maintaining generation barriers.

···

At 2:42 AM +0900 9/4/02, Yukihiro Matsumoto wrote:

Hi,

In message “Re: Best GC for Ruby?” > on 02/08/30, “Justin Johnson” justinj@mobiusent.com writes:

Maintaining the remembered sets without a hardware write barrier is still
posing a problem. I can’t seem to find any good information on software
write barriers. There didn’t seem to be anything in the Jones and Lins
book.

See “7.5 Inter-generational pointers”. In p.172, “Using bytes rather
than bits speeds up the write-barrier, reducing it to just three SPARC
instructions in addition to the actual store”. So it’s not
impossible, at least in theory.

Also, I’m not sure how software write barriers could be introduced without
making extension programming more error prone and difficult.

It can be, if all attribute modifies are done by API. Though we can’t
assume this if we are developing general purpose collector like
Boehm’s.


Dan

--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk

Hi Justin,

Please excuse me on my lack of knowledge regarding the GC model. When we
talk about the “roots”, does it mean only roots for marking, or does it
mean roots for both marking and sweeping? (In the current Ruby GC, I
think the roots are used for marking, while sweeping just goes through the
whole heaps/array.) Is it possible to have separate sweeping areas?

Regards,

Bill

···

===========================================================================
Justin Johnson justinj@mobiusent.com wrote:

The garbage collector uses the C/C++ stack to find roots. Roots can also be
manually registered. Any object that is to be controlled by garbage
collection must be derived from RbObject (which is in turn derived from an
object called GcObject in my GC library). When GC objects are being marked,
the objects gc() method is called to trace other objects to mark.
gcTraverse() calls into the garbage collector to do bit marking. It does
not recurse but uses an iteration technique similar to Cheney’s copying.
Using the gc() method means that the garbage collector knows more about the
objects it can collect and therefore cuts down on the number of
misidentifications that the Boehm GC experiences. It should also make
marking faster. This is a little more work for the programmer but I think
it’s worth the small effort.

Out of curiosity, have you given this a shot to see how performance
looks? I’ve been debating the merits of generational collection, but
I’ve been worried about the performance impact (and the code hassle)
of maintaining generation barriers.

Take a look at the papers posted for ocaml garbage collection. It is quite
efficient. It is impacted by a write barrier frequently making it faster to
write new data than updating references. New data is very cheap.
There are two generations: 50% space is young generation. It is
incrementally copied and compacted. The other half is a traditional
mark/sweep for old data.
collection is triggered when enough space is wasted and a small amount is
collected.
When comparing to various complex colouring principles the OCaml GC is both
simple and efficient.

see the gc section on the page.

http://caml.inria.fr/ocaml/papers.html

On the same page it is worth reading the Zinc experiment.
It is the foundation of the efficient bytecode interpreter (which in turn is
the foundation for the efficient native code compiler). The ZINC experiment,
an economical implementation of the ML language (10 years+ old, but still
interesting).

Mikkel

Hi,

···

In message “Re: Best GC for Ruby?” on 02/09/04, Dan Sugalski dan@sidhe.org writes:

Out of curiosity, have you given this a shot to see how performance
looks? I’ve been debating the merits of generational collection, but
I’ve been worried about the performance impact (and the code hassle)
of maintaining generation barriers.

Generational GC performs quite well if there are bunch of live
objects. But when we implemented the generational GC for Ruby, the
write barrier cost wasn’t trivial. As a result, we lost performance
for avarage cases. Probably more efficient write barrier such as card
marking may help. We will try again someday.

						matz.

Hi,

···

In message “Re: Best GC for Ruby?” on 02/09/04, “Justin Johnson” justinj@mobiusent.com writes:

It can be, if all attribute modifies are done by API. Though we can’t
assume this if we are developing general purpose collector like
Boehm’s.

Did you mean like in the setPointer macro/function above?

For the current Ruby implementation, all attribute changes are done
via API; variables, references from arrays and hashes, etc. Exception
is wrapped C/C++ data value. We solved this not by using setPointer
macro, but by nailing them to the young generation.

						matz.

Out of curiosity, have you given this a shot to see how performance
looks? I’ve been debating the merits of generational collection, but
I’ve been worried about the performance impact (and the code hassle)
of maintaining generation barriers.

I think I need to allocate a bit of time for experimenting to see just what
the impact is. I don’t much like the idea of maintaining software write
barriers and hardware write barriers are no go for me.

Interestingly, in my implementation I’ve got a class called RbValue which
represents Rubys internal VALUE. There’s more to it though. It can take
the behaviour of an integer or a pointer to an object. The ‘=’ operator is
overloaded and there is significant debug only safety code. An RbValue
object can cast to an RbObject* if it’s setup as an RbObject*.

Example:

RbValue Value1 = 10; // Value is an integer (FIXNUM)
10
RbValue PtrObject = pObjectA; // Value is an object ptr.

Because the ‘=’ operator is overloaded (with an inline so it retains speed
in release builds) I have an ideal point at which to intercept the setting
of RbValues that are pointers to objects. It would also be possible (and
desirable?) to create a destructor for RbValues that deregisters them from
remembered sets. I’ll have to think about that.

For C code I don’t think you can get away from having to use macros.

The other major problem with software write barrier is that I believe that
fastest way is to use card marking with 1 byte (as Matz mentioned). This
means that you need to be able to find all data pointers within a card, or
chunk of heap. For my current system, data pointers are identified by the
programmer manually so this idea doesn’t seem compatible.

···


Justin Johnson

Marking is the process of flagging live data. When using C/C++, it’s common
to pass pointers to functions/methods and the data that the pointers point
to needs to be recognized as live. A common solution to this is to iterate
through the C/C++ stack treating each stack element as a potential pointer.
There are some good checks you can do to discount stack elements that aren’t
pointers.

The sweeping traditionally sweeps all data and reclaims any data not marked
as live. In my case, I allocate memory in pages of 4k and can do some quick
checks to see if an entire page can be reclaimed in one go. There’s a few
other tricks you can use to make this process fast.

It sounds like you might be thinking about whats classically termed
‘generations’. This is a technique employed to reduce the amount of work
done in common garbage collections. And this is where I’ve got a problem.
You need to track pointers within objects that point to data in other
generations. To do this in software means you need to adopt a method of
monitoring inter-generational pointer changes which is what the previous
posting is all about.

···


Justin Johnson

“William Djaja Tjokroaminata” billtj@y.glue.umd.edu wrote in message
news:al38aq$l43$2@grapevine.wam.umd.edu…

Hi Justin,

Please excuse me on my lack of knowledge regarding the GC model. When we
talk about the “roots”, does it mean only roots for marking, or does it
mean roots for both marking and sweeping? (In the current Ruby GC, I
think the roots are used for marking, while sweeping just goes through the
whole heaps/array.) Is it possible to have separate sweeping areas?

Regards,

Bill

Bummer–I’d hoped you had a good solution to my existing problem. :slight_smile:
The exposure of the internals, even partially, to extensions written
in C also seem to complicate things.

···

At 11:09 AM +0900 9/4/02, Yukihiro Matsumoto wrote:

Hi,

In message “Re: Best GC for Ruby?” > on 02/09/04, Dan Sugalski dan@sidhe.org writes:

Out of curiosity, have you given this a shot to see how performance
looks? I’ve been debating the merits of generational collection, but
I’ve been worried about the performance impact (and the code hassle)
of maintaining generation barriers.

Generational GC performs quite well if there are bunch of live
objects. But when we implemented the generational GC for Ruby, the
write barrier cost wasn’t trivial. As a result, we lost performance
for avarage cases. Probably more efficient write barrier such as card
marking may help. We will try again someday.


Dan

--------------------------------------“it’s like this”-------------------
Dan Sugalski even samurai
dan@sidhe.org have teddy bears and even
teddy bears get drunk