Ruby Specification

Joel VanderWerf <vjoel@PATH.Berkeley.EDU> wrote in message news:<40FC2CD0.7090408@path.berkeley.edu>...

Mark Sparshatt wrote:
> The problem would come if someone did release an alternate
> implementation of Ruby with reference counting GC. Then if people who
> were used to alternate Ruby moved to standard ruby they might expect the
> same behaviour, and assume it's a bug when standard ruby behaves
> differently.

In my understanding, ruby makes no guarantees about memory management,
except that unreferenced objects will eventually get recycled. A ruby
specification should state explicitly that, aside from this guarantee,
behavior is unspecified.

Of course, if programmers don't read the spec, they can still get bitten...

Technically there is not even a guarantee that an unreferenced object
will be recycled. Data on the stack can collide with a non-immediate
VALUE, causing the GC to believe a reference still exists after the
"real" references are gone.

Of course the probability of such a collision is small, but it doesn't
hurt to know it is there. For long-running daemons using many objects
and/or ruby C code using lots of stack space it may possibly be
something to consider, though I haven't looked at the numbers yet.

Jeff Mitchell wrote:

Joel VanderWerf <vjoel@PATH.Berkeley.EDU> wrote in message news:<40FC2CD0.7090408@path.berkeley.edu>...

Mark Sparshatt wrote:

The problem would come if someone did release an alternate implementation of Ruby with reference counting GC. Then if people who were used to alternate Ruby moved to standard ruby they might expect the same behaviour, and assume it's a bug when standard ruby behaves differently.

In my understanding, ruby makes no guarantees about memory management, except that unreferenced objects will eventually get recycled. A ruby specification should state explicitly that, aside from this guarantee, behavior is unspecified.

Of course, if programmers don't read the spec, they can still get bitten...

Technically there is not even a guarantee that an unreferenced object
will be recycled. Data on the stack can collide with a non-immediate
VALUE, causing the GC to believe a reference still exists after the
"real" references are gone.

Of course the probability of such a collision is small, but it doesn't
hurt to know it is there. For long-running daemons using many objects
and/or ruby C code using lots of stack space it may possibly be
something to consider, though I haven't looked at the numbers yet.

True. I was forgetting that the GC is conservative. So "eventually" may mean "before the interpreter exits".

Hello Jeff,

Of course the probability of such a collision is small, but it doesn't
hurt to know it is there. For long-running daemons using many objects
and/or ruby C code using lots of stack space it may possibly be
something to consider, though I haven't looked at the numbers yet.

Arachno Ruby as a larger program is using the Boehm Weisser GC. That
does not only consider all elements on the stack but also everything
in the heap as possible pointers, so there are many things that can be
wrongly considered as a still in use data. But this seems to be not
the problem, when i compare it with the exact SmartEiffel Internal GC,
the overhead in non released data seems to be around 20%. The much
more serious problem in the Boehm Weisser GC is the high internal heap
fragmentation it seems to get stable after having 5 times the
dataset size, from previous posts here i believe that the ruby GC is
working well.

So if we ever change the internal memory managment (i
guess we do not) in ruby, then i hope it will also include a
compacting GC. In the meantime a lazy mark with write barrier and lazy
sweep would be a good thing to add. I found that the GC can give you a
huge time penalty because it simply runs to often and there is no way
to customize this.

···

--
Best regards, emailto: scholz at scriptolutions dot com
Lothar Scholz http://www.ruby-ide.com
CTO Scriptolutions Ruby, PHP, Python IDE 's

Lothar Scholz wrote:

Hello Jeff,

> Of course the probability of such a collision is small, but it doesn't
> hurt to know it is there. For long-running daemons using many objects
> and/or ruby C code using lots of stack space it may possibly be
> something to consider, though I haven't looked at the numbers yet.

Arachno Ruby as a larger program is using the Boehm Weisser GC. That
does not only consider all elements on the stack but also everything
in the heap as possible pointers, so there are many things that can be
wrongly considered as a still in use data. But this seems to be not
the problem, when i compare it with the exact SmartEiffel Internal GC,
the overhead in non released data seems to be around 20%. The much
more serious problem in the Boehm Weisser GC is the high internal heap
fragmentation it seems to get stable after having 5 times the
dataset size, from previous posts here i believe that the ruby GC is
working well.

So if we ever change the internal memory managment (i
guess we do not) in ruby, then i hope it will also include a
compacting GC. In the meantime a lazy mark with write barrier and lazy
sweep would be a good thing to add. I found that the GC can give you a
huge time penalty because it simply runs to often and there is no way
to customize this.

Fascinating stuff. You sound MUCH smarter than Lothar of the Hill People (played by Mike Meyers). And your Arachno Ruby IDE looks really cool too.

IMHO, I think it would be good to leverage all the research and millions already invested by Sun, IBM, et al. in improving java's gc. I think java provides some command line options related to garbage collection.

No need to reinvent the wheel--unless the existing wheel isn't round. :slight_smile:

I think Ruby 2.0 is very lucky to be looking into bytecode generation and gc today because of all the work done recently by others.

Arachno Ruby as a larger program is using the Boehm Weisser GC. That
does not only consider all elements on the stack but also everything
in the heap as possible pointers, so there are many things that can be
wrongly considered as a still in use data. But this seems to be not

Do you use it as a drop-in replacement for "malloc" ? Because, from
what i remember, this GC provides several functions to give it more
precise information about the data in your heap. I think there are
functions to tell it to allocate for example a chunk of memory for
binary data for which you, as the programmer, guarantee that it will
never contain pointers.

the problem, when i compare it with the exact SmartEiffel Internal GC,
the overhead in non released data seems to be around 20%. The much
more serious problem in the Boehm Weisser GC is the high internal heap
fragmentation it seems to get stable after having 5 times the
dataset size, from previous posts here i believe that the ruby GC is
working well.

This behavior depends for a great deal on the behavior of your
application. For BDW, fragmentation could only be improved by having a
better allocation strategy.

So if we ever change the internal memory managment (i guess we do
not) in ruby, then i hope it will also include a compacting GC. In
the meantime a lazy mark with write barrier and lazy sweep would be
a good thing to add. I found that the GC can give you a huge time
penalty because it simply runs to often and there is no way to
customize this.

Compacting is only possible if you know the root set and have exact
information about which cells contain a pointer and which cells
don't. AFAIK, a pure compacting collector isn't possible with ruby
because of the C extensions... or scanning the raw stack for finding
pointers in the root set (?)

Ruben

···

At Wed, 21 Jul 2004 14:05:43 +0900, Lothar Scholz wrote:

Gully Foyle wrote:

Fascinating stuff. You sound MUCH smarter than Lothar of the Hill People (played by Mike Meyers). And your Arachno Ruby IDE looks really cool too.

Haha, now, play nice. Lothar is a sharp guy. Though LOL at Mike Meyers.

IMHO, I think it would be good to leverage all the research and millions already invested by Sun, IBM, et al. in improving java's gc. I think java provides some command line options related to garbage collection.

No need to reinvent the wheel--unless the existing wheel isn't round. :slight_smile:

But is it? Personally GC is one of those areas where I'm completely
ignorant. I need to delve into SICP on of these days... it's on my
to-do list. :wink:

Hal

Hello Ruben,

Arachno Ruby as a larger program is using the Boehm Weisser GC. That
does not only consider all elements on the stack but also everything
in the heap as possible pointers, so there are many things that can be
wrongly considered as a still in use data. But this seems to be not

Do you use it as a drop-in replacement for "malloc" ? Because, from

Yes i only changed malloc and calloc.

what i remember, this GC provides several functions to give it more
precise information about the data in your heap. I think there are
functions to tell it to allocate for example a chunk of memory for
binary data for which you, as the programmer, guarantee that it will
never contain pointers.

Right, it can completely avoid scanning heap data if you give a type
descriptor for allocated memory that tells the GC where it must look
for embedded pointers. But this requires a larger hack in the
SmartEiffel compiler and i hope that the new 2.0 release will fix the
bugs in there internal GC.

the problem, when i compare it with the exact SmartEiffel Internal GC,
the overhead in non released data seems to be around 20%. The much
more serious problem in the Boehm Weisser GC is the high internal heap
fragmentation it seems to get stable after having 5 times the
dataset size, from previous posts here i believe that the ruby GC is
working well.

This behavior depends for a great deal on the behavior of your
application. For BDW, fragmentation could only be improved by having a
better allocation strategy.

Right, i heared about the same numbers on the smalleiffel mailing list
from other persons.

Compacting is only possible if you know the root set and have exact
information about which cells contain a pointer and which cells
don't. AFAIK, a pure compacting collector isn't possible with ruby
because of the C extensions... or scanning the raw stack for finding
pointers in the root set (?)

Right, it is impossible with the current design and it needs a lot of
changes to external C extensions and also the use of an own ruby
thread instead of mixing it with the C stack. It's just that when you
talk about a complete rewrite, like the OP's wish for a "rubycc" then
it should be discussed. "matz" already pointed out that he don't care
about fragmentation is will not add something like this to the
official ruby implementation.

···

At Wed, 21 Jul 2004 14:05:43 +0900, > Lothar Scholz wrote:

--
Best regards, emailto: scholz at scriptolutions dot com
Lothar Scholz http://www.ruby-ide.com
CTO Scriptolutions Ruby, PHP, Python IDE 's