On Symbols

If I may make one correction/clarification to an otherwise *excellent*
explanation of the implementation...

For symbols, the least significant byte is 0x0e and the upper 3 bytes
are a integer. The integer is uniquely assigned value for a string.
Think about it as the table index for a string. By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing "evan" is created and stored

Clarification: a *C* string containing "evan" is created...

in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, "evan" is looked up
in the symbol table to obtain 9232 again.

So, to review:

a = :evan

a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

Correction: a.to_s returns a reference to a new String object
containing the same sequence of characters as the C string in the
symbol table. This is visible when you compare the result of
#object_id on subsequent calls to a.to_s:

a = :evan # => :evan
a.to_s.object_id # => 1657424
a.to_s.object_id # => 1653204

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it's odd, it's a Fixnum immediate value. If
it's 0,2,4, or 6, it's a "core" immediate value. If it has the LSB is
0x0e, it's a symbol. Otherwise, it's a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Thanks, Evan! Aside from those minor, pedantic corrections, it was
indeed an excellent lesson.

Jacob Fugal

···

On 1/5/06, evanwebb@gmail.com <evanwebb@gmail.com> wrote:

evanwebb@gmail.com wrote:

Here is a short breakdown of how symbols (and other immediates are
implemented):

...

By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing "evan" is created and stored
in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, "evan" is looked up
in the symbol table to obtain 9232 again.

So, to review:

a = :evan

a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it's odd, it's a Fixnum immediate value. If
it's 0,2,4, or 6, it's a "core" immediate value. If it has the LSB is
0x0e, it's a symbol. Otherwise, it's a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Evan Webb // evan@fallingsnow.net

Thanks Evan, much appreciated!

This is how I understand it now:

1. When the interpreter sees :<string> it looks in the "symbol table"

2. if it finds the value, it returns the int index (or the computed
object_id?) of it otherwise creates a new entry

3. somestringofchars.object_id returns something which is a function (in
mathematical sense) of the index of somestringofchars in the symbol
table. (i.e., indexofsymbolstring << 8 | 0xE0 )

4. ':' is just a way of giving the interpreter the heads up that a
symbol is coming up in the token stream (ie we think we know what we're
talking about, can you please look it up in the symbol table? nice
interpreter.. nice interpreter)

5. some object_id values are computed differently, for example the
session below (I don't know why, its a hole in my understanding of how
object_id's are assigned):

irb(main):054:0> false.object_id
=> 0
irb(main):055:0> 0.object_id
=> 1
irb(main):056:0> true.object_id
=> 2
irb(main):057:0> 1.object_id
=> 3
irb(main):058:0> nil.object_id
=> 4
irb(main):059:0> def.object_id # not sure why
irb(main):060:1> undef.object_id # possibly cuz its strictly
internal

6. Whether a token is valid or not, it gets added to the symbol table
and an object_id _can be_ computed from the symbol based on what type
of symbol it is (only if its a valid object methinks). Otherwise an
error is thrown.

8. there is a separate table which holds the variables. I'm not sure if
this is true from what I've seen in irb, it looks like a variable's
symbol gets stored in the symbol table as well or at least a symbol
which may point to its value location.

:8.1 Every time we refer to a variable var , the interpreter uses the
:var thingy
so if we did xx="Hi there", there would be an :xx created, but I don't
know how to get to "Hi there" from :xx (:xx.to_s just gives me "xx")

9. a symbol is just an atomic representation (AFA-the user-IC) of a
token added to the symbol table and exposed via the Symbol class so we
can use it if we want to instead of creating new string objects for
referring to things like methods etc and incurring needless overhead
(however small it might be).

10. I'm guessing Ruby interpreter needs symbols for its own housekeeping
(obviously) but the implementers were just being nice and allowed end
users to use them too for certain specific situations (I can't think of
a good example ).

So, basically, the first thing the interpreter does is, it takes the
token and stuffs it in the symbol table, then it figures out what to do
with it (steps 1..n) . And since we have access to the symbol for a
given reference, why not use that instead of referring to it via a
string object which gets created anew every time we referr to it. Even
though the end result is the same.

thanks,

-A

···

--
Posted via http://www.ruby-forum.com/\.

Just because there seems to be so much confusion i would like to
point out some minor flaws in your post so nobody else stumble over them. (Correct me please if the error is on my side)

evanwebb@gmail.com wrote:

Here is a short breakdown of how symbols (and other immediates are
implemented):

A variable holds a value. That value is a integer. The value of the
integer determines what it means. For example:

If the integer is odd, then the remaining bits of the integer are a
Fixnum value.

This means that if you do

a = 0

the interpreter stores in the local variable table the value
0x00000001. If you had assigned 4 to a, then the value would be
0x00000041. This allows for all Fixnums to not require additional

no, i think it would store 0x00000009 because only the first bit
is reserved, not the first nibble. ( 4 << 1 | 1)

memory to represent. The same goes for true, false, nil, and symbols.
For the first 3, they are:

Name Backend Integer Value

false 0
true 2
nil 4
undef 6 (This isnt accessible from native ruby code, but is used
internally)

For symbols, the least significant byte is 0x0e and the upper 3 bytes
are a integer. The integer is uniquely assigned value for a string.
Think about it as the table index for a string. By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing "evan" is created and stored
in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, "evan" is looked up

This should obviusly be ((9232 << 8 ) | 0x0e)

in the symbol table to obtain 9232 again.

So, to review:

a = :evan

a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

This seems to create a copy each time. (at least if there is no ruby
string around)

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it's odd, it's a Fixnum immediate value. If
it's 0,2,4, or 6, it's a "core" immediate value. If it has the LSB is
0x0e, it's a symbol. Otherwise, it's a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Evan Webb // evan@fallingsnow.net

Thanks this may be realy helpfull for those who want to understand symbols and have a decent idea how interpreters work.

cheers

Simon

I disagree. I was using them quite often, but didn't understand them at all.

Now, thanks to this 99 post thread, I believe I understand them at an
intuitive level, and I understand them enough to correctly code with them
every time.

As we speak, I'm writing symbol documentation for the Ruby Nuby who has no
concept (so far) of Ruby internals or Ruby customs.

And nobody even gave me two cents :slight_smile:

SteveT

Steve Litt

slitt@troubleshooters.com

···

On Friday 30 December 2005 01:13 pm, Gregory Brown wrote:

On 12/30/05, Christian Neukirchen <chneukirchen@gmail.com> wrote:
> Ryan Leavengood <leavengood@gmail.com> writes:
> > Zen and the Art of Ruby Programming: Symbols just are.
>
> +1

+1 more. People will understand them if they just use them.

Austin Ziegler wrote:

Probably an implementation detail. That doesn't mean you can't ignore
it, but it means that it's generally a good idea to use Symbols in a
lot of places. Just don't pretend that they're immutable Strings.
They're not. If you want that, freeze the String.

I think the big difference is if you want to use symbols
for values that come from external sources for long-running
processes like servers.

If symbols are garbage collected in Ruby 2, the answer is
yes, you could use them in all the places that you'd
use interned strings in Java or symbols in lisp.

If not, the answer is no, then you probably only want to
use symbols on values that the author of the program is
totally in control of; or in short lived processes.

Quoting dblack@wobblini.net:

Fixnums, Symbols, true, false, and nil get assigned directly to
variables.

It may be helpful to think of them as objects which are addressed
directly by their values.

-mental

DAB wrote:

Fixnums, Symbols, true, false, and nil get assigned directly to
variables. For other objects, variables get a reference to the
object. References, like variables, are not themselves objects.
They're part of a kind of language substratum on which the object
system floats.

True, but ...

[...] In fact, in a sense there's even
less to it, since Ruby handles any necessary de-referencing for you,
so you don't have to make any explicit distinction in your code.

I think that this second statement is much more important than the
first. The fact that 1073741823 is direct and 1073741824 is a reference
makes very little difference 99.99% of the time.

Understanding that Fixnums, Symbols and the such are direct is good for
groking implementation details, but provides little insight in using
Ruby. The fact we can pretend that everything is a reference is a
marvelous feature of the Ruby object model.

Just MHO.

···

--
-- Jim Weirich

--
Posted via http://www.ruby-forum.com/\.

Jacob,

Correct. The symbol table holds pointers to C strings and new String
objects are created with each call to Symbol#to_s.

I should note that Symbols are native ruby access to the ruby runtime
ID type. It's this reason that C strings are stored in the symbol
table, because the C functions for using ID's (rb_intern and
rb_id2name) use/return char * and ID.

I think you'll get nickels, dimes, quarters, and other such change if
you can put an end to this seemingly bottomless topic.

···

On 12/30/05, Steve Litt <slitt@earthlink.net> wrote:

As we speak, I'm writing symbol documentation for the Ruby Nuby who has no
concept (so far) of Ruby internals or Ruby customs.

And nobody even gave me two cents :slight_smile:

Hmm, sounds like that could be used as an exploit to kill a rails server.
Don't they use a lot of symbol <=> string conversion for their SQL queries
or URL accesses?

···

On 12/30/05, Ron M <rm_rails@cheapcomplexdevices.com> wrote:

I think the big difference is if you want to use symbols
for values that come from external sources for long-running
processes like servers.

If not, the answer is no, then you probably only want to
use symbols on values that the author of the program is
totally in control of; or in short lived processes.

--
Jim Freeze

Steve Litt wrote:

And nobody even gave me two cents :slight_smile:

Ask for more, try three.

Funny new year !

Hi --

DAB wrote:

Fixnums, Symbols, true, false, and nil get assigned directly to
variables. For other objects, variables get a reference to the
object. References, like variables, are not themselves objects.
They're part of a kind of language substratum on which the object
system floats.

True, but ...

[...] In fact, in a sense there's even
less to it, since Ruby handles any necessary de-referencing for you,
so you don't have to make any explicit distinction in your code.

I think that this second statement is much more important than the
first. The fact that 1073741823 is direct and 1073741824 is a reference
makes very little difference 99.99% of the time.

Understanding that Fixnums, Symbols and the such are direct is good for
groking implementation details, but provides little insight in using
Ruby. The fact we can pretend that everything is a reference is a
marvelous feature of the Ruby object model.

It is indeed. I think one reason I'm so immediate-value-aware is the
history of discussions of ++, the absence of which makes sense in
light of the immediate value thing (with numbers). And I guess I'd
put that at a middle level -- not something you really need to know to
use Ruby, but not quite an implementation detail either; more of a
language design thing.

David

···

On Thu, 5 Jan 2006, Jim Weirich wrote:

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!

+1

In fact I think of Fixnum and Symbol literals as
just special variables pre-bound to references to particular
objects in the way that uninitialized instance variables
are pre-bound to the nil object, but unlike variables you
can't vary the binding. I'd call them constants but then
everyone would get confused with that whiny group of variables
that start with an uppercase letter and then complain every-
time something changes.

Gary Wright

···

On Jan 4, 2006, at 7:49 PM, Jim Weirich wrote:

The fact we can pretend that everything is a reference is a
marvelous feature of the Ruby object model.

Gregory Brown <gregory.t.brown@gmail.com> writes:

···

On 12/30/05, Steve Litt <slitt@earthlink.net> wrote:

As we speak, I'm writing symbol documentation for the Ruby Nuby who has no
concept (so far) of Ruby internals or Ruby customs.

And nobody even gave me two cents :slight_smile:

I think you'll get nickels, dimes, quarters, and other such change if
you can put an end to this seemingly bottomless topic.

I think this topic will be terminated by the coming year-end holiday
more so than any concensus or documentation.

YS.