Best GC for Ruby?

Hi,

It seems that the example shows the convenience of bringing in the power
of C++. I haven’t started or thought about it, as the Pickaxe book says
“…including awkward marriages of Ruby and C++, …”. I know that
someone is rewriting Ruby in Java (JRuby); might this show the advantages
of writing Ruby in C++ instead of in C?

Regards,

Bill

···

==========================================================================
Justin Johnson justinj@mobiusent.com wrote:

Interestingly, in my implementation I’ve got a class called RbValue which
represents Rubys internal VALUE. There’s more to it though. It can take
the behaviour of an integer or a pointer to an object. The ‘=’ operator is
overloaded and there is significant debug only safety code. An RbValue
object can cast to an RbObject* if it’s setup as an RbObject*.

Example:

RbValue Value1 = 10; // Value is an integer (FIXNUM)
10
RbValue PtrObject = pObjectA; // Value is an object ptr.

Because the ‘=’ operator is overloaded (with an inline so it retains speed
in release builds) I have an ideal point at which to intercept the setting
of RbValues that are pointers to objects. It would also be possible (and
desirable?) to create a destructor for RbValues that deregisters them from
remembered sets. I’ll have to think about that.

For C code I don’t think you can get away from having to use macros.

Hi Matz,

I agree in principle of making the best for the average cases. However,
because in my application there are so many live objects (network
components such as nodes and links are always there; they may be up or
down, but they don’t “physically” disappear), I am so far off from the
average cases. So I really hope that in the next generation of Ruby GC,
there will be special provisioning of dealing with this “many live
objects” case, at least through the C API’s. (Probably even a simple
“switch” whether to use the standard GC or the generational GC?)

I think Java also has a garbage collector (also mark and sweep type, I
think), and Java has been used in large-scale enterprise
applications. Can we at least try to understand how Java deal with the
scalability issue of mark-and-sweep GC?

Regards,

Bill

···

=============================================================================
Yukihiro Matsumoto matz@ruby-lang.org wrote:

Generational GC performs quite well if there are bunch of live
objects. But when we implemented the generational GC for Ruby, the
write barrier cost wasn’t trivial. As a result, we lost performance
for avarage cases. Probably more efficient write barrier such as card
marking may help. We will try again someday.

  					matz.

Hi,

Generational GC performs quite well if there are bunch of live
objects. But when we implemented the generational GC for Ruby, the
write barrier cost wasn’t trivial. As a result, we lost performance
for avarage cases. Probably more efficient write barrier such as card
marking may help. We will try again someday.

Bummer–I’d hoped you had a good solution to my existing problem. :slight_smile:

Life is not that easy as always. :wink:

The exposure of the internals, even partially, to extensions written
in C also seem to complicate things.

To make things easy, we put all touched (i.e. internal revealed) C
data into remembered set.

						matz.
···

In message “Re: Best GC for Ruby?” on 02/09/04, Dan Sugalski dan@sidhe.org writes:

Hi,

···

In message “Re: Best GC for Ruby?” on 02/09/05, William Djaja Tjokroaminata billtj@z.glue.umd.edu writes:

I think Java also has a garbage collector (also mark and sweep type, I
think), and Java has been used in large-scale enterprise
applications. Can we at least try to understand how Java deal with the
scalability issue of mark-and-sweep GC?

Depends on each VM implementation. JVM specification does not require
GC in fact. Some uses simple mark and sweep, some uses more
sophisticated train algorithm (a variation of generationa GC).

						matz.

For that matter, we don’t really need a garbage collector in Ruby
either, but I doubt anyone would use a Ruby or Java interpreter that did
not have GC.

Paul

···

On Thu, Sep 05, 2002 at 11:57:36PM +0900, Yukihiro Matsumoto wrote:

Depends on each VM implementation. JVM specification does not require
GC in fact. Some uses simple mark and sweep, some uses more
sophisticated train algorithm (a variation of generationa GC).

Hi Matz,

This is very interesting. Therefore, just from the GC overhead point of
view, if the JVM uses simple mark and sweep, I will guess then Ruby
performance will be comparable to Java (using comparable GC parameters of
course)… (and therefore network simulators may not perform as well in
Java as in C++, especially for large-scale networks…)

Which brings us to the second topic: the object model. In Ruby, in the
spirit of Smalltalk, everything is an object. In Java, certain data types
(such as int, float, and double) are “native” while the rest are
objects. In Ruby, I guess for optimization purpose, you have put the
‘int’ as an immediate object, at the cost of 1 bit. Therefore, in the
future, when 64-bit pointer is very common, will you also put the ‘double’
as an immediate object, and therefore probably approaching the Java model?

(Actually, at least in theory, we currently can also put ‘float’ as an
immediate object, can’t we? But probably it is not as simple as providing
BigFloat and FixFloat, as the mantissa also differs, not just the
exponent. Has there been any study/experiment/survey on how much having
both “immediate object” and “pointer object” saves as compared to the
simpler model of “everything is pointer object”? I guess having
everything as pointer object will make the C API’s simpler.)

Regards,

Bill

···

=============================================================================
Yukihiro Matsumoto matz@ruby-lang.org wrote:

I think Java also has a garbage collector (also mark and sweep type, I
think), and Java has been used in large-scale enterprise
applications. Can we at least try to understand how Java deal with the
scalability issue of mark-and-sweep GC?

Depends on each VM implementation. JVM specification does not require
GC in fact. Some uses simple mark and sweep, some uses more
sophisticated train algorithm (a variation of generationa GC).

  					matz.

‘double’ is already 64 bits, and so won’t fit, unless we do some nasty
manipulation of the mantissa. We’d instead have to switch to using
floats, which are 32 bits.

Paul

···

On Fri, Sep 06, 2002 at 01:08:04AM +0900, William Djaja Tjokroaminata wrote:

Which brings us to the second topic: the object model. In Ruby, in the
spirit of Smalltalk, everything is an object. In Java, certain data types
(such as int, float, and double) are “native” while the rest are
objects. In Ruby, I guess for optimization purpose, you have put the
‘int’ as an immediate object, at the cost of 1 bit. Therefore, in the
future, when 64-bit pointer is very common, will you also put the ‘double’
as an immediate object, and therefore probably approaching the Java model?

Well, currently int is usually 32 bits and most pointers are 32 bits and
we already sacrifice one bit. Therefore, when 64-pointers are standard,
I guess at least in theory we can do the same thing to double. Except, of
course, we can judge how “nasty” it is, as compared to the performance
that will be gained. (Isn’t it the structure of ‘double’ and ‘float’ has
been standardized by IEEE a long time ago?)

Regards,

Bill

···

============================================================================
Paul Brannan pbrannan@atdesk.com wrote:

On Fri, Sep 06, 2002 at 01:08:04AM +0900, William Djaja Tjokroaminata wrote:

Which brings us to the second topic: the object model. In Ruby, in the
spirit of Smalltalk, everything is an object. In Java, certain data types
(such as int, float, and double) are “native” while the rest are
objects. In Ruby, I guess for optimization purpose, you have put the
‘int’ as an immediate object, at the cost of 1 bit. Therefore, in the
future, when 64-bit pointer is very common, will you also put the ‘double’
as an immediate object, and therefore probably approaching the Java model?

‘double’ is already 64 bits, and so won’t fit, unless we do some nasty
manipulation of the mantissa. We’d instead have to switch to using
floats, which are 32 bits.

Paul

There is a standard (ieee 754), but not everyone uses it. There are
some systems out there that even use base-10 floating-point arithmetic
instead of base-2 like the rest of us.

Even if we assume that Ruby will only ever run on systems that use the
ieee recommendations, manipulating the bits in a double really isn’t a
game we should play; it will probably give a lot of programmers
surprising results. It would also require writing a test suite so we
can make sure we haven’t broken anything (i.e. infinity/NaN are still
representable, addition/subtraction/etc. work for all combinations of
numbers, precision isn’t lost except the precision we expect to lose by
throwing out a bit, etc.). In short, it’s just not worth the trouble.

Paul

···

On Fri, Sep 06, 2002 at 01:48:15AM +0900, William Djaja Tjokroaminata wrote:

Well, currently int is usually 32 bits and most pointers are 32 bits and
we already sacrifice one bit. Therefore, when 64-pointers are standard,
I guess at least in theory we can do the same thing to double. Except, of
course, we can judge how “nasty” it is, as compared to the performance
that will be gained. (Isn’t it the structure of ‘double’ and ‘float’ has
been standardized by IEEE a long time ago?)

Well, then it seems that I have to wait until 128-bit pointer is common,
in which case VALUE is defined as “unsigned long long long” :).

Is it then correct if I say that the current Ruby will not work in a
16-bit system, where ‘long’ is only 16 bits? (Not common at all, although
I am not sure about some embedded systems. But probably no one will ever
use Ruby in an embedded system, although some people here use Java to do
embedded programming.)

Regarding the test suites, is it really that much more complicated than
the corresponding test suites for FixNum and BigNum? For a first cut,
probably we can sacrifice exponent rather than mantissa and have the same
logic as the conversion between FixNum and BigNum when we have FixFloat
(31-bit ‘float’) and BigFloat (pure, native 64-bit ‘double’). (People who
do numerical computations extensively and care about precision should have
used NArray, anyway.)

But in the end, I think you would be right; probably this complication is
not worthwhile. Here I am just trying to draw analogy with Java (we don’t
have to reinvent the wheel, and there is nothing wrong to take the good
things, right? :slight_smile: )

Regards,

Bill

···

===========================================================================
Paul Brannan pbrannan@atdesk.com wrote:

There is a standard (ieee 754), but not everyone uses it. There are
some systems out there that even use base-10 floating-point arithmetic
instead of base-2 like the rest of us.

Even if we assume that Ruby will only ever run on systems that use the
ieee recommendations, manipulating the bits in a double really isn’t a
game we should play; it will probably give a lot of programmers
surprising results. It would also require writing a test suite so we
can make sure we haven’t broken anything (i.e. infinity/NaN are still
representable, addition/subtraction/etc. work for all combinations of
numbers, precision isn’t lost except the precision we expect to lose by
throwing out a bit, etc.). In short, it’s just not worth the trouble.

Paul

Yes, it’s a LOT more complicated.

See:
http://www.netlib.org/fp/ucbtest.tgz
IEEE 754 floating-point test software

for some code that tests correct floating point operation. There are
many many more things that can go wrong with floating point operations
than with integer operations; hitting all these cases is a job best left
to the experts.

Paul

···

On Fri, Sep 06, 2002 at 04:48:47AM +0900, William Djaja Tjokroaminata wrote:

Regarding the test suites, is it really that much more complicated than
the corresponding test suites for FixNum and BigNum? For a first cut,
probably we can sacrifice exponent rather than mantissa and have the same
logic as the conversion between FixNum and BigNum when we have FixFloat
(31-bit ‘float’) and BigFloat (pure, native 64-bit ‘double’). (People who
do numerical computations extensively and care about precision should have
used NArray, anyway.)

William Djaja Tjokroaminata wrote:

Regarding the test suites, is it really that much more complicated than
the corresponding test suites for FixNum and BigNum? For a first cut,
probably we can sacrifice exponent rather than mantissa and have the same
logic as the conversion between FixNum and BigNum when we have FixFloat
(31-bit ‘float’) and BigFloat (pure, native 64-bit ‘double’). (People who
do numerical computations extensively and care about precision should have

I think there is a minor inaccuracy in your considerations:

In the current implementation of Ruby, there are 231 possible
noninteger objects and 2
31 possible integers coded by one VALUE.
If a single VALUE is used for storing floats, then you will
have to reserve two bits instead of one for describing
the type of the object : so you will have 30 bit integers and 30 bit
floats…

Additionally, this would add an extra comparison to each object acces
which would impair the overall performance a bit.

Regards, Christian

Yes, you are correct. I am just playing with philosophy regarding the
Ruby object model. At least the initial Tcl had all objects as string,
before it moved to the compound object model, where I think an
“object” (VALUE in Ruby) is a C ‘union’ of int, double, etc. I don’t
think Python (at least several years ago) had the concept of immediate
object and pointer object (but it has the concept of immutable object and
mutable object).

I think the closest thing to Ruby object model is Java object model, where
there are “native” data type and “object”, for the reason that Java is
also interpreted. Of course, we have to make variable declarations in
Java and Java was not written in C; so I don’t know whether Ruby can also
have the magic of Java and not just have ‘int’ (and some others) as
immediate object. I think this is very philosophical; for C I am sure
Matz has thought thoroughly that the current object model is the best. I
don’t know if we use C++ whether we can have VALUE as “smart
pointer” instead and whether it will make the GC implementation somewhat
easier, like the example shown by Justin Johnson…

Regards,

Bill

···

===========================================================================
Christian Szegedy szegedy@t-online.de wrote:

I think there is a minor inaccuracy in your considerations:

In the current implementation of Ruby, there are 231 possible
noninteger objects and 2
31 possible integers coded by one VALUE.
If a single VALUE is used for storing floats, then you will
have to reserve two bits instead of one for describing
the type of the object : so you will have 30 bit integers and 30 bit
floats…

Additionally, this would add an extra comparison to each object acces
which would impair the overall performance a bit.

Regards, Christian