Question about symbols

Hi everyone...

I've just read the Pragmatic Programming Ruby book. But I don't understand one thing : why should I use symbols (colon sign before the variable name) ie. :name ?

Regards
Jacek

Because they are immutable, fast and less memory consuming. They are typically used for a limited and known set of keys.

Kind regards

    robert

PS: Added to http://www.rubygarden.org/ruby?RubyTalkPermaThreads

···

Jacek Olszak <jacekolszak@o2.pl> wrote:

Hi everyone...

I've just read the Pragmatic Programming Ruby book. But I don't
understand one thing : why should I use symbols (colon sign before
the variable name) ie. :name ?

Regards
Jacek
http://jacekolszak.blogspot.com

Hi --

Hi everyone...

I've just read the Pragmatic Programming Ruby book. But I don't understand one thing : why should I use symbols (colon sign before the variable name) ie. :name ?

See the other answers for details and interesting stuff.... I just
wanted to clarify the point that you're not putting the colon before a
variable name, but before the string of which this is the equivalent
symbol. You can even do:

   :"This is a symbol with whitespace"

or even the String#intern method:

   "So is this".intern

David

···

On Wed, 9 Nov 2005, Jacek Olszak wrote:

--
David A. Black
dblack@wobblini.net

Quoting Robert Klemme <bob.news@gmx.net>:

Because they are immutable, fast and less memory consuming. They
are typically used for a limited and known set of keys.

Well, they consume memory differently -- not necessarily less.

Symbols are never garbage-collected, so you should not use them for
situations where you could have an unbounded number of unique
symbol values.

-mental

"David A. Black" <dblack@wobblini.net> writes:

Hi --

Hi everyone...

I've just read the Pragmatic Programming Ruby book. But I don't
understand one thing : why should I use symbols (colon sign before
the variable name) ie. :name ?

See the other answers for details and interesting stuff.... I just
wanted to clarify the point that you're not putting the colon before a
variable name, but before the string of which this is the equivalent
symbol. You can even do:

   :"This is a symbol with whitespace"

or even the String#intern method:

   "So is this".intern

Just note that you can't intern strings with null-byte.

···

On Wed, 9 Nov 2005, Jacek Olszak wrote:

David

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Shouldn't this be "So is this".to_sym ? :wink:

-austin

···

On 11/9/05, David A. Black <dblack@wobblini.net> wrote:

   :"This is a symbol with whitespace"

or even the String#intern method:

   "So is this".intern

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

mental@rydia.net wrote:

Quoting Robert Klemme <bob.news@gmx.net>:

Because they are immutable, fast and less memory consuming. They
are typically used for a limited and known set of keys.

Well, they consume memory differently -- not necessarily less.

There are two reasons why I wrote about higher memory consumption for
strings:

1. If you have a string constant in your code like "foo" or 'bar' then the
constant takes up memory and each evaluation creates a new String object.
Although they share the internal char buffer there is some memory
overhead. You can easily try it:

5.times { puts "foo".object_id }

135178740
135178716
135178620
135178488
135178464
=> 5

2. If stored in multiple places then there's still only one symbol but
multiple strings. Even worse, if those strings stem from different
sources they won't even share their char buffer.

Symbols are never garbage-collected, so you should not use them for
situations where you could have an unbounded number of unique
symbol values.

That's why I recommended to use them with a limited set of values only.

Hope that makes things a bit clearer.

Kind regards

    robert

mental@rydia.net wrote:

...
Symbols are never garbage-collected, so you should not use them for
situations where you could have an unbounded number of unique
symbol values.

Is the non-garbage-collection of symbols just a matter of the
current garbage collector, or something inherent in the ruby
language?

It seems Lisps typically do garbage collect symbols (*note 1),
and since Java 1.2, "interned" strings (*2) are also garbage
collected; and java symbols like method names area cleaned
up through their class unloaders.

I think the answer to this is pretty relevant to the guys
in this thread who were thinking of using large numbers of
dynamically generated symbols.

If Yarv2.0 will garbage collect symbols, I think the answer
becomes "yeah, use symbols for that if your immediate memory
needs can handle it"; but if there's some fundamental reason
why symbols won't be collected down the road, I think the
answer becomes the much uglier "emulate your own symbols with
a map of weak references".

Notes:
*1 http://community.schemewiki.org/?scheme-faq-language "Most Schemes
   do perform garbage-collection of symbols, since otherwise programs
   using string->symbol to dynamically create symbols would consume
   ever increasing amounts of memory even if the created symbols are
   no longer being used."

*2 interned Strings : Java Glossary "In the early JDKs, any
   string you interned could never be garbage collected because the
   JVM had to keep a reference to in its Hashtable so it could
   check each incoming string to see if it already had it in the
   pool. With JDK 1.2 came weak references. Now unused interned
   strings will be garbage collected."

Thanks everyone for replies. I've got it :smiley: Now I know why should I use symbols.

Regards
Jacek

···

On Wed, 09 Nov 2005 09:32:12 +0100, Robert Klemme <bob.news@gmx.net> wrote:

mental@rydia.net wrote:

Quoting Robert Klemme <bob.news@gmx.net>:

Because they are immutable, fast and less memory consuming. They
are typically used for a limited and known set of keys.

Well, they consume memory differently -- not necessarily less.

There are two reasons why I wrote about higher memory consumption for
strings:

1. If you have a string constant in your code like "foo" or 'bar' then the
constant takes up memory and each evaluation creates a new String object.
Although they share the internal char buffer there is some memory
overhead. You can easily try it:

5.times { puts "foo".object_id }

135178740
135178716
135178620
135178488
135178464
=> 5

2. If stored in multiple places then there's still only one symbol but
multiple strings. Even worse, if those strings stem from different
sources they won't even share their char buffer.

Symbols are never garbage-collected, so you should not use them for
situations where you could have an unbounded number of unique
symbol values.

That's why I recommended to use them with a limited set of values only.

Hope that makes things a bit clearer.

Kind regards

    robert

--
Using Opera's revolutionary e-mail client: Opera Web Browser | Faster, Safer, Smarter | Opera

I wonder if I could trouble the list a bit further on this one: I've got a collection of newswire articles, and I was thinking of using symbols to represent the words in each. I'd likely only ever be using a half-dozen articles at any one time, but I'd be using on the order of tens of thousands of articles over the course of the program. My hunch is that to avoid stuffing memory full of unused symbols, I should deal with each subset of articles in a forked process - I'm assuming that since threads run in the interpreter, they'll share the same symbol ids as the parent.

This raises a couple of quick questions: firstly, how big is a "limited set", in a hand-waving, give or take an order-of-magnitude sort of estimate?

Secondly, I note that the FAQ says that fork "is slow". Am I right to think that's simply the overhead of starting up the thread, and then things run smoothly thereafter, or is there some other penalty of which I'm not aware?

thanks in advance & all enlightenment appreciated.

matthew smillie.

···

On Nov 9, 2005, at 8:32, Robert Klemme wrote:

Symbols are never garbage-collected, so you should not use them for
situations where you could have an unbounded number of unique
symbol values.

That's why I recommended to use them with a limited set of values only.

Hope that makes things a bit clearer.

Matthew Smillie wrote:

I wonder if I could trouble the list a bit further on this one: I've
got a collection of newswire articles, and I was thinking of using
symbols to represent the words in each.

This doesn't feel like a "use symbols" application.

This raises a couple of quick questions: firstly, how big is a
"limited set", in a hand-waving, give or take an order-of-magnitude
sort of estimate?

I'll wave my hands (in a completely non-scientific way) at 300..500 max.
(Table look-up isn't hot.)

Await the hoots of derision :wink:

daz

Matthew Smillie wrote:

I wonder if I could trouble the list a bit further on this one: I've
got a collection of newswire articles, and I was thinking of using
symbols to represent the words in each. I'd likely only ever be
using a half-dozen articles at any one time, but I'd be using on the
order of tens of thousands of articles over the course of the
program. My hunch is that to avoid stuffing memory full of unused
symbols, I should deal with each subset of articles in a forked
process - I'm assuming that since threads run in the interpreter,
they'll share the same symbol ids as the parent.

I would not use Symbols for that and I'd also not change the application
architecture (i.e. using fork) because forking essentially would try to
fix a problem introduced by using Symbols.

Also, this smells a bit like premature optimization. Do you actually need
to have those words separately? What exactly are you doing with your
articles?

This raises a couple of quick questions: firstly, how big is a
"limited set", in a hand-waving, give or take an order-of-magnitude
sort of estimate?

IMHO typically these are only few per use case.

Secondly, I note that the FAQ says that fork "is slow". Am I right
to think that's simply the overhead of starting up the thread, and
then things run smoothly thereafter, or is there some other penalty
of which I'm not aware?

fork doesn't start a thread but it creates a new process. This means it
increases resource consumption on the system (i.e. there's a slight
management overhead for the additional process and you need more memory -
even with copy on write).

Kind regards

    robert

Lots.

It all depends on how much RAM you have available.

···

On Wednesday 09 November 2005 3:04 am, Matthew Smillie wrote:

This raises a couple of quick questions: firstly, how big is a
"limited set", in a hand-waving, give or take an order-of-magnitude
sort of estimate?

-----
a = 'a'
1.upto(2000000) do
  a.to_sym
  a.next!
end

puts "#{a} -- sleeping now"
sleep 30
-----

Run that. It creates two million symbols from :a to :ditoc, then sleeps so
that you can go look at memory usage before it exits.

On my box a ps shows something like this:

herbie 22539 74.4 6.3 131944 130320 pts/1 S 06:21 0:07 ruby /tmp/a.rb

It shows no significant variation between Ruby 1.8.0, 1.8.1, 1.8.2, or 1.8.3.

So, if you have the RAM, you can use a considerable number of symbols.

Secondly, I note that the FAQ says that fork "is slow". Am I right
to think that's simply the overhead of starting up the thread, and
then things run smoothly thereafter, or is there some other penalty
of which I'm not aware?

The expense of a fork is in the overhead required to create a new process.
That expense disappears into the background noise if the process that is
being forked persists for more than a short period of time, though.

If you have multiple processes that need to interact, then you have to deal
with interprocess communication. There are many ways to do this, but one
very convenient way with Ruby is to make use of Drb + Rinda + Tuplespace.
They provide a simple to setup and effective way of passing messages around.

Kirk Haines

Matthew Smillie wrote:

···

On Nov 9, 2005, at 8:32, Robert Klemme wrote:

Symbols are never garbage-collected, so you should not use them for
situations where you could have an unbounded number of unique
symbol values.

That's why I recommended to use them with a limited set of values only.

Hope that makes things a bit clearer.

I wonder if I could trouble the list a bit further on this one: I've got a collection of newswire articles, and I was thinking of using symbols to represent the words in each.

Why don't you use a proper compression method like gzip?

IIUC, he does not want to compress the articles, but to do linguistic
analysis on the text.

brian

···

On 10/11/05, Andreas S. <usenet@andreas-s.net> wrote:

Matthew Smillie wrote:
> On Nov 9, 2005, at 8:32, Robert Klemme wrote:
>
>>> Symbols are never garbage-collected, so you should not use them for
>>> situations where you could have an unbounded number of unique
>>> symbol values.
>>
>>
>> That's why I recommended to use them with a limited set of values only.
>>
>> Hope that makes things a bit clearer.
>
>
> I wonder if I could trouble the list a bit further on this one: I've
> got a collection of newswire articles, and I was thinking of using
> symbols to represent the words in each.

Why don't you use a proper compression method like gzip?

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/