evanwebb@gmail.com wrote:
Here is a short breakdown of how symbols (and other immediates are
implemented):
...
By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing "evan" is created and stored
in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, "evan" is looked up
in the symbol table to obtain 9232 again.
So, to review:
a = :evan
a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table
The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it's odd, it's a Fixnum immediate value. If
it's 0,2,4, or 6, it's a "core" immediate value. If it has the LSB is
0x0e, it's a symbol. Otherwise, it's a pointer to a memory address that
holds the information about the object.
Thats the end of days leason! Hope it helps!
Evan Webb // evan@fallingsnow.net
Thanks Evan, much appreciated!
This is how I understand it now:
1. When the interpreter sees :<string> it looks in the "symbol table"
2. if it finds the value, it returns the int index (or the computed
object_id?) of it otherwise creates a new entry
3. somestringofchars.object_id returns something which is a function (in
mathematical sense) of the index of somestringofchars in the symbol
table. (i.e., indexofsymbolstring << 8 | 0xE0 )
4. ':' is just a way of giving the interpreter the heads up that a
symbol is coming up in the token stream (ie we think we know what we're
talking about, can you please look it up in the symbol table? nice
interpreter.. nice interpreter)
5. some object_id values are computed differently, for example the
session below (I don't know why, its a hole in my understanding of how
object_id's are assigned):
irb(main):054:0> false.object_id
=> 0
irb(main):055:0> 0.object_id
=> 1
irb(main):056:0> true.object_id
=> 2
irb(main):057:0> 1.object_id
=> 3
irb(main):058:0> nil.object_id
=> 4
irb(main):059:0> def.object_id # not sure why
irb(main):060:1> undef.object_id # possibly cuz its strictly
internal
6. Whether a token is valid or not, it gets added to the symbol table
and an object_id _can be_ computed from the symbol based on what type
of symbol it is (only if its a valid object methinks). Otherwise an
error is thrown.
8. there is a separate table which holds the variables. I'm not sure if
this is true from what I've seen in irb, it looks like a variable's
symbol gets stored in the symbol table as well or at least a symbol
which may point to its value location.
:8.1 Every time we refer to a variable var , the interpreter uses the
:var thingy
so if we did xx="Hi there", there would be an :xx created, but I don't
know how to get to "Hi there" from :xx (:xx.to_s just gives me "xx")
9. a symbol is just an atomic representation (AFA-the user-IC) of a
token added to the symbol table and exposed via the Symbol class so we
can use it if we want to instead of creating new string objects for
referring to things like methods etc and incurring needless overhead
(however small it might be).
10. I'm guessing Ruby interpreter needs symbols for its own housekeeping
(obviously) but the implementers were just being nice and allowed end
users to use them too for certain specific situations (I can't think of
a good example ).
So, basically, the first thing the interpreter does is, it takes the
token and stuffs it in the symbol table, then it figures out what to do
with it (steps 1..n) . And since we have access to the symbol for a
given reference, why not use that instead of referring to it via a
string object which gets created anew every time we referr to it. Even
though the end result is the same.
thanks,
-A
···
--
Posted via http://www.ruby-forum.com/\.