Rick DeNatale wrote:
Reference counting is pretty much an obsolete approach to GC. It was
probably the first approach taken for lisp back in the 1950s. Other
language implementations usually started with reference counting (e.g.
the first Smalltalk).
It's main advantage is that it's easy to understand.
I don't think reference counting is any easier to understand than pure mark-and-sweep or pure stop-and-copy. The main advantage of reference counting in my opinion is that its restrictions force you to kick some features out of your language design if you want to use it.
Mark and sweep, such as is used in the Ruby 1.8 implementation quickly
replaced reference counting as the simplest GC considered for real
use.
My recollection is that mark-and-sweep was the original, and that reference counting came later.
More modern GCs tend to use copying GCs which move live objects to new
heap blocks leaving the dead ones behind. And most use generational
scavenging which takes advantage of the observation that most objects
either die quite young, or live a long time. This approach was
pioneered by David Ungar in the Berkeley implementation of
Smalltalk-80. And this is the kind of GC typically used in JVMs
today.
Bah ... I actually found a reference a couple of days ago on this (http://portal.acm.org/citation.cfm?id=91597\). If you're not signed up for the ACM library it will cost you money to read it. But essentially "pure" mark-and-sweep was replaced by stop-and-copy, which compacts the heap. Then generational mark-and-sweep came along and "rehabilitated" mark-and-sweep. Note the publication date -- 1990. The abstract is free -- it reads:
"Stop-and-copy garbage collection has been preferred to mark-and-sweep collection in the last decade because its collection time is proportional to the size of reachable data and not to the memory size. This paper compares the CPU overhead and the memory requirements of the two collection algorithms extended with generations, and finds that mark-and-sweep collection requires at most a small amount of additional CPU overhead (3-6%) but, requires an average of 20% (and up to 40%) less memory to achieve the same page fault rate. The comparison is based on results obtained using trace-driven simulation with large Common Lisp programs."
Which particular GC approach is best for Ruby is subject to some study.
I think at least for Rails on Linux, someone (assuming funding) could collect and analyze plenty of data. I'd actually be surprised if someone *isn't* doing it, although I know *I'm* not.
Many of the usages of ruby aren't quite like those of Java, or
Smalltalk. I had dinner with a former colleague, who happens to be
the lead developer of the IBM J9 java virtual machine, and he made the
observation that Java, and Smalltalk before it have a long history of
having their VMs tuned for long running processes. On the other hand
many Ruby usages are get in and get out. These use cases mean that
it's more valuable to have rapid startup than perfect GC in the sense
that all dead objects are reclaimed quickly, not that any of the
current GCs guarantee the latter.
Well ... OK. If you want to distinguish between long running (server) and rapid startup (client), that's fine. But look at the marketplace. We have servers, we have laptop clients, we have desktop clients, we have mobile clients, and we have bazillions of non-user-programmable computers like DVD players, iPods, in-vehicle navigation systems, etc.
Now while the hard-core hackers like me wouldn't buy an iPod or a DVD player, preferring instead to add hard drive space to a real computer, Apple isn't exactly going broke making iPods and iPhones that are (for the moment, anyhow) closed to "outsiders". And I'm guessing that, while you *can* run Ruby on, say, an embedded ARM/Linux platform, most of the software in those gizmos is written in C and heavily optimized.
I've got a couple of embedded toolkits, and I've actually built Ruby for them, but when you only have 32 MB of RAM, you don't want to collect garbage -- you don't even want to *generate* garbage! So I wouldn't personally spend much time thinking about garbage collection for rapid startup. If you want rapid startup, you're going to have as much binding as possible done at compile time -- you aren't even going to compile a Ruby script to an AST when you start a process up.
So the best GC for Ruby might not be the same as would be used for a
JVM or Smalltalk VM, but I'm almost certain it would be a reference
counter.
Did you mean to say, "not be a reference counter"?