Symbol vs String

Hi,

Just a dumb question: what is the real difference between { :aKey => "aValue" } and { "aKey" => "aValue" } ? I know the first key is a symbol the latter is a string. I like string keys why should I use symbols? Why symbols worth to use as keys?
Thanks,

Gábor

Sebestyén Gábor:

I like string keys why should I use symbols?

Because symbols
  - are faster and
  - save you one byte in your rb file.

Malte

* Sebestyén Gábor (Mar 16, 2005 21:40):

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys? Thanks,

Always use symbols for situations like these. The reason is that a
symbol is immutable and also that no new string needs to be created for
it if used more than once. Also, using strings as symbols and then
having the string altered will force a rehash of the table. It's all
about memory savings and execution speed,
  nikolai

···

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Symbols take up less memory space (only allocated once for the same Symbol) and have a faster #hash function (#object_id, not computed).

'x' == 'x' # => true
'x'.object_id == 'x'.object_id # => false

:x == :x # => true
:x.object_id == :x.object_id # => true

PGP.sig (186 Bytes)

···

On 16 Mar 2005, at 12:37, Sebestyén Gábor wrote:

Hi,

Just a dumb question: what is the real difference between { :aKey => "aValue" } and { "aKey" => "aValue" } ? I know the first key is a symbol the latter is a string. I like string keys why should I use symbols? Why symbols worth to use as keys?

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Why?

Personally I think :symbols are great, makes it much clearer when you
are reading code that you are representing something else, rather than
storing a piece of data. And you can use them without having to define
them as constants before hand. Great :slight_smile:

Faster to type too.

Douglas

···

On Thu, 17 Mar 2005 05:37:12 +0900, Sebestyén Gábor <segabor@chello.hu> wrote:

I like string keys

Use Strings for their content. Use Symbols for their arbitrary uniqueness.

···

On Wednesday 16 March 2005 03:37 pm, Sebestyén Gábor wrote:

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys?

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

You mean this?

key = 'foo'

hash = {}

hash[key] = 5

key.gsub! /foo/, 'bar'

In this case, hash.rehash does not need to be called because Ruby copies String hash keys:

hash.keys.first.object_id == key.object_id # => false

Also, String keys are frozen, so you can't modify them:

hash.keys.first.gsub! /foo/, 'bar' # => raises TypeError

PGP.sig (186 Bytes)

···

On 16 Mar 2005, at 13:00, Nikolai Weibull wrote:

Also, using strings as symbols and then having the string altered will force a rehash of the table.

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

"Nikolai Weibull" <mailing-lists.ruby-talk@rawuncut.elitemail.org> schrieb im Newsbeitrag news:20050316210032.GE5638@puritan.pcp.ath.cx...

* Sebestyén Gábor (Mar 16, 2005 21:40):

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys? Thanks,

Always use symbols for situations like these. The reason is that a
symbol is immutable and also that no new string needs to be created for
it if used more than once. Also, using strings as symbols and then
having the string altered will force a rehash of the table. It's all
about memory savings and execution speed,

I rather make the distinction on the semantic level: for example, if you write an initializer for a class that accepts a hash to init any number of instance fields I'd prefer to use symbols here. Also, if there is only a certain fixed set of values allowed. I use strings if they are read from some source and I don't know beforehand, what they might be.

Incidentally it's typical for the key like things to occur rather often, which fits nicely with the memory and speed savings incurred by symbols.

Kind regards

    robert

Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:

> Just a dumb question: what is the real difference between { :aKey =>
> "aValue" } and { "aKey" => "aValue" } ? I know the first key is a
> symbol the latter is a string. I like string keys why should I use
> symbols? Why symbols worth to use as keys?

Use Strings for their content. Use Symbols for their arbitrary uniqueness.

I used to do this, but ran into problems.

Symbols are great for things related to ruby becuase the :bar form for symbol
literals accepts the same kind of chars as ruby identifiers. I use them be
preference in interacting with ruby's meta-programming APIs.

They start to fall down outside of this. For example, I tried to use
with mime types:

  :text
      ==>:text
  :video
      ==>:video
  :octet-stream
  NameError: undefined local variable or method `stream' for main:Object
          from (irb):3
  'octet-stream'.intern
      ==>:"octet-stream"

You CAN use them for things outside of the domain of ruby names, but it gets
painful if the names of those things are arbitarily unique, but have "-"
characters in their name, you first have to create a String!

You can get around this by creating constants:

  OCTETSTREAM = 'octet-stream'.intern
  TEXT = :text

etc., but that might not fit your API goals very well.

Anyhow, I moved back to using strings instead of symbols. The need to create a
string and intern it for things that are logically symbols but have a "-" in
them was too painful.

That was my experience, anyhow.

Cheers,
Sam

···

On Wednesday 16 March 2005 03:37 pm, Sebestyén Gábor wrote:

* Eric Hodel (Mar 16, 2005 22:20):

> Also, using strings as symbols and then having the string altered
> will force a rehash of the table.

[basically saying that this isn't so]

OK, so this strengthens the argument for using symbols even further, as
keys will be copied. Thanks for pointing this out,
  nikolai

···

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

But why do Strings not behave like Symbols? I mean, why aren't all Strings immutable? Is this because Symbols will never get garbage collected (to make sure they can be used over and over again) and normal Strings will? Which might mean that in some cases (lots of text processing) immutable Strings would fill up memory?

Regards,

Peter

Sam Roberts wrote:

Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys?

Use Strings for their content. Use Symbols for their arbitrary uniqueness.

I used to do this, but ran into problems.

Symbols are great for things related to ruby becuase the :bar form for symbol
literals accepts the same kind of chars as ruby identifiers. I use them be
preference in interacting with ruby's meta-programming APIs.

They start to fall down outside of this. For example, I tried to use
with mime types:

  :text
      ==>:text
  :video
      ==>:video
  :octet-stream
  NameError: undefined local variable or method `stream' for main:Object
          from (irb):3
  'octet-stream'.intern
      ==>:"octet-stream"

You CAN use them for things outside of the domain of ruby names, but it gets
painful if the names of those things are arbitarily unique, but have "-"
characters in their name, you first have to create a String!

You can get around this by creating constants:

  OCTETSTREAM = 'octet-stream'.intern
  TEXT = :text

etc., but that might not fit your API goals very well.

Anyhow, I moved back to using strings instead of symbols. The need to create a
string and intern it for things that are logically symbols but have a "-" in
them was too painful.

That was my experience, anyhow.

I believe you can do things like :"octet-stream" -- but I grant
that is not much better.

Hal

···

On Wednesday 16 March 2005 03:37 pm, Sebestyén Gábor wrote:

Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:
> Use Strings for their content. Use Symbols for their arbitrary
> uniqueness.

I used to do this, but ran into problems.

[...]

  :octet-stream

  NameError: undefined local variable or method `stream' for main:Object
          from (irb):3
  'octet-stream'.intern
      ==>:"octet-stream"

Why couldn't you do :octet_stream ? If your answer is because the dash comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

···

On Wednesday 16 March 2005 09:29 pm, Sam Roberts wrote:

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

* Peter C. Verhage (Mar 16, 2005 23:40):

But why do Strings not behave like Symbols? I mean, why aren't all
Strings immutable? Is this because Symbols will never get garbage
collected (to make sure they can be used over and over again) and
normal Strings will? Which might mean that in some cases (lots of text
processing) immutable Strings would fill up memory?

Oh, no...not immutable vs. mutable strings again...

Well, if strings were immutable, then that would mean that strings could
share contents, and thus immutable strings wouldn't fill up memory. I
have suggested on the ruby-core list that Ruby should provide a second
data structure that acts like a string, namely the _rope_, and that it
be implemented in a way that allows for it to be used for tasks where
immutable "strings" are desired.

A rope is basically a string represented by a tree. Leafs of the tree
point to the subsequences of the whole string. These subsequences can
be shared with other ropes and can be generated lazily, i.e., from IO or
other generators. All that is needed is the length of the subsequence.
Every internal node keeps track of its own size and the size of its left
child. Thus, the offset of a node in the tree is the size of its left
child plus its ancestors. Ropes can be used to represent long strings
efficiently and many operations on ropes are O(1) where they are O(n) on
a string. This is offset by the fact that lookup in a rope is O(lg n)
versus O(1) for a string, but in many cases this isn't a problem.

Anyway, the rope data structure is further described in [1]. Boehm has
actually implemented this in C for his garbage collector, so see that
package for an example implementation (not though that it uses a lot of
C-hacks which makes it undesirable to use as-is). There's also a rope
data structure in STL, but it's limited to only using ropes and strings,
not IO,
  nikolai (the rope and piece table lover)

[1] Hans-J Boehm, "Ropes: an Alternative to Strings", Software--Practice
and Experience, vol. 25(12), 1315--1330, Dec. 1995. Available at
http://rubyurl.com/2FRbO\.

···

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Peter C. Verhage wrote:

But why do Strings not behave like Symbols? I mean, why aren't all Strings immutable? Is this because Symbols will never get garbage collected (to make sure they can be used over and over again) and normal Strings will? Which might mean that in some cases (lots of text processing) immutable Strings would fill up memory?

Some people (such as Guido) dislike mutable strings.
Others (such as Matz, and incidentally me) like them.

Personally, my limited Java experience juggling String and
StringBuffer was enough to convince me that strings should
be mutable.

Hal

Hal Fulton wrote:

Sam Roberts wrote:

[...]
Anyhow, I moved back to using strings instead of symbols. The need to create a
string and intern it for things that are logically symbols but have a "-" in
them was too painful.

That was my experience, anyhow.

I believe you can do things like :"octet-stream" -- but I grant
that is not much better.

And there's also the %s(octet-stream) family.

Quoting hal9000@hypermetrics.com, on Thu, Mar 17, 2005 at 12:02:19PM +0900:

Sam Roberts wrote:
> OCTETSTREAM = 'octet-stream'.intern
> TEXT = :text

I believe you can do things like :"octet-stream" -- but I grant
that is not much better.

But a little better, I didn't know that, thanks.

Sam

Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 01:36:39PM +0900:

···

On Wednesday 16 March 2005 09:29 pm, Sam Roberts wrote:
> Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:
> > Use Strings for their content. Use Symbols for their arbitrary
> > uniqueness.
>
> I used to do this, but ran into problems.
[...]
> :octet-stream
>
> NameError: undefined local variable or method `stream' for main:Object
> from (irb):3
> 'octet-stream'.intern
> ==>:"octet-stream"

Why couldn't you do :octet_stream ? If your answer is because the dash comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

Maybe I don't know what you mean by "arbitrarily unique".

"_" vs "-" is no more (or less) important than "a" vs. "z".

Cheers,
Sam

Sam Roberts said:

Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 01:36:39PM +0900:

> Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM
+0900:
> > Use Strings for their content. Use Symbols for their arbitrary
> > uniqueness.
>
> I used to do this, but ran into problems.
[...]
> :octet-stream
>
> NameError: undefined local variable or method `stream' for
main:Object
> from (irb):3
> 'octet-stream'.intern
> ==>:"octet-stream"

Why couldn't you do :octet_stream ? If your answer is because the dash
comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

Maybe I don't know what you mean by "arbitrarily unique".

"_" vs "-" is no more (or less) important than "a" vs. "z".

If the choice if symbol names is arbitrary, then I can change the name of
the symbol everywhere that references it without changing the semantics of
the program.

For example, if any of the following choices are equally valid:
:octetstream, :OctetStream, :octet_stream, :stream_of_octets, :octets,
:fido, then the choice of name is arbitrary. Of course, some choices are
more transparent and convey meaning better, but the program will still
work even if we call the symbol :xyzzy. That's what it means to be
arbitrary.

If the choice of letters is constrained by some outside force, then it is
not arbitrary. For example, it might come to you as an attribute in an
XML message. Or perhaps you need to write it to a file, and other
programs expect that exact sequence of strings. In all these cases, the
content (sequence of letters) is important and cannot be changed without
breaking the program. When the content of the item is important, use a
string.

···

On Wednesday 16 March 2005 09:29 pm, Sam Roberts wrote:

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Quoting jim@weirichhouse.org, on Sat, Mar 19, 2005 at 12:50:38AM +0900:

Sam Roberts said:
> Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 01:36:39PM +0900:
>> > Quoting jim@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM
>> +0900:
>> > > Use Strings for their content. Use Symbols for their arbitrary
>> > > uniqueness.
>> >
>> > I used to do this, but ran into problems.
>> [...]
>> > :octet-stream
>> >
>> > NameError: undefined local variable or method `stream' for
>> main:Object
>> > from (irb):3
>> > 'octet-stream'.intern
>> > ==>:"octet-stream"
>>
>> Why couldn't you do :octet_stream ? If your answer is because the dash
>> comes
>> from outside ruby, then I would suggest that the content ("_" vs "-") is
>> important ... indicating that you should use strings.
>
> Maybe I don't know what you mean by "arbitrarily unique".
>
> "_" vs "-" is no more (or less) important than "a" vs. "z".

If the choice if symbol names is arbitrary, then I can change the name of
the symbol everywhere that references it without changing the semantics of
the program.

For example, if any of the following choices are equally valid:
:octetstream, :OctetStream, :octet_stream, :stream_of_octets, :octets,
:fido, then the choice of name is arbitrary. Of course, some choices are
more transparent and convey meaning better, but the program will still
work even if we call the symbol :xyzzy. That's what it means to be
arbitrary.

Ah. Then, no, its not really arbitrary. More specifically, I can make it
arbitrary, but then I might be forced to make it more and more
arbitrary! If I map:

  x-mailer => :xmailer

Then somebody decides to make a header

  xmailer

I have to map:

  xmailer => :zz_xmailer

etc. I guess I could madk a mapping table, hashing strings to
symbols, but at this point symbols aren't making my code clearer or
easier to use.

In the example of mime types, I probably could use abitray symbols.
Anybody who decides to make a new mime type called application/octet_stream or
application/octet_stream given tha application/octet-stream is a
standard name deserves to be publically humiliated. So I could use
:octetstream, arbitrarily.

I just wanted to use symbols for the efficiency, and to emphasize their
uniqueness in terms of case-sensitivity, it seemed to fit, but for
serveral reasons I discovered it didn't.

Cheers,
Sam

···

>> On Wednesday 16 March 2005 09:29 pm, Sam Roberts wrote: