Apologies in advance for this meaty posting:
I’m currently in the process of developing a Ruby implementation
that’s more suited to embedding.
I had a look at many languages, Smalltalk, Lisp, Scheme, Self, Python,
Java, C#, Lua, ElastiC, but figured that Ruby has the cleanest and
most appealing OO model and language syntax.
Here’s a few differences with regards to my implementation. I wonder
if anybody here would like to comment:
- I support the Ruby core language and object model but won’t be
implementing all of the libraries. For example, file handling and
regexp will be optional. Networking and net programming won’t be
supported. - I’ve written a generational garbage collector that should be much
faster than the Ruby mark-and-sweep collector. The young generation
is implemented using a Cheney-style copying collector which means that
allocations are very fast and only the ‘live’ set is visited. There
is a seperate ‘large-chunk’ space for dealing with large binary
resources. - In the current Ruby implementation, everything is represented by
linked nodes. Unless I’m mistaken, that means that even code is
scanned for garbage collection. I have the concept of an atom or
slot. These are 64bit elementes that represent 32bits of flags/counts
and a data element. Objects, even internal hash tables and arrays,
are composed of these slots and are allocated in a unified way. - Methods are represented as bytecodes. The method bytecodes are
stored in the ‘large-chunk’ manager and so are not a burden on the
garbage collector. I’m still working out the best opcode arrangment. - I’m developing in C++.
So far, I have the garbage collector and the Ruby class and object
code written, methods and class/instance variables can be accessed.
Mixins via include are fully supported and I have the initial Ruby
metaclasses and class hierarchy initialized. I have the beginnings of
the lexer and parser.
After having a look at the original Ruby source code, a few things
struck me:
-
The internal symbol function is called rb_intern(). This generates
a unique number with embedded symbol type and is used as a selector
for method and variable lookups. But a hash must be generated from
this number each time which is a little time consuming. Perhaps a
speed improvement would be to have the intern return a pointer to a
unique data structure which contains the precalculated hash value?intern_symbol* pSymbol = rb_inter( … );
int hash = pSymbol->hash;
int type = pSymbol->type; -
The code implements a method cache to accelerate method lookup.
How about a cache for object variable lookup? This would accelerate
class variable lookup.
As a final point, I hear a lot of talk about finalization and it’s
pitfalls. It seems to me that finalization happens to late to be
truly useful. What would the theoretical implications be of an
optional destroy mechanism?
a = MyClass.new
a.show => "I live!"
a.destroy
a.show => exception! 'a' does not exist!
The ‘destroy’ method would have the effect of broadcasting the
’destroy’ message to all variables belonging to ‘a’. ‘a’ would then be
marked as destroyed, any pointers to ‘a’ would be treated similar to
weak pointers - if they pointed to a destroyed object they could
become pointers to ‘Nil’. The garbage collector could easily be made
aware of this.
This would allow the programmer to override the ‘destroy’ method and
do useful resource deallocation, file closing and all the other
stuff that people complain about.
What issue might this raise? Not all classes should define 'destroy’
for obvious reasons…
Lastly, gratitude to matz for developing such an elegant, powerful and
simple language.
Justin Johnson
justinj@mobiusent.com