Why was the "Symbol is a String"-idea dropped?

Hello everybody,

Although not a lot from the Ruby-Core specialists,
but still I have learned a lot from the discussion.
I am trying build a conceptual picture now.

Some say Strings and Symbols are conceptuelly very different
some say they are quite close.

I view it like this:
Symbols essentially are names, Strings essentially are data,
while they both appear as sequences of characters.

Names/Symbols are just atomic, constant, unrelated entities,
while Strings as data have a rich life, they can be related in
many ways they can be analysed, even be modified.

That's a clean distinction and I think it is very well-represented
in the current Ruby implementation.

It this light, it seems nonsensical to make one the subclass of the other.
(A common superclass would be OK, though.)

Now, in practice, the situation gets more complex:
1. Names sometimes turn into data (option names, method names, table names...),
    especially when things get highly dynamic.
2. Sometimes, programmers to use the conceptually "wrong" class, maybe
    as a kind of optimization, for the sake of beauty or out of lazyness ... :slight_smile:

One could argue that it is good that Symbol and String are well-separated,
because it educates programmers to decide for the "correct" class to use.

On the other hand, the following situation occurs very very often:
You need to transfer a sequence of characters -- which format do you use
always Symbols, always Strings, should it allow both? (Or even a fancy object)

First, you could argue that when you use duck-typing, the interface can be kept open.
But still, many situations remain, where this question is remains.

This choice can be a burden, especially if you think of inter-operability or optimisation.

And that is an argument for some sort of unification of Symbol and String.

Subclassing alone would not be enough, to solve the problem above,
also, String#== and Symbol#== would have to be defined such that "a" == :a
And also #hash would have to be defined accordingly.

Then you would still have the two different kinds of objects ("a" and :a)
but they would behave quite the same except for modifying methods.

Now, as I am writing this, I doubt that the advantages
of the unification are really worth doing it...

It depends on factors not known to me.

But now, I think I can understand the core-team's decision better.

Bye
Sven

Brian Candler schrieb:

···

On Wed, May 16, 2007 at 12:23:09AM +0900, Gary Wright wrote:

On May 15, 2007, at 10:53 AM, Brian Candler wrote:

Yes, but it's not a singleton.
     

You've stated or implied a couple of times in this discussion that
symbols are 'singletons', but I thought the conventional definition
of 'singleton' was of a class with only a single instance, where the
instance is called a singleton. That doesn't describe Ruby's symbols.

I think what you are getting at is the idea that identity and
equality are one and the same for symbols.
   

No, that's not exactly what I meant, but sorry for not being more precise.
What I meant was: there is only ever one symbol object in existence for a
particular sequence of characters. :foo.object_id in one part of the program
is always the same as :foo.object_id elsewhere.

If it were Symbol.new("foo") always returning the same object then I guess
it would probably be called the multiton pattern.

Isn't the term "immediate value" used for that? Like:
   :abc is an immediate value, and so is 12, so is nil
   "abc" is a reference value und so is [1, 2] and also {} and even 12.0

Subclassing alone would not be enough, to solve the problem above,
also, String#== and Symbol#== would have to be defined such that "a" == :a
And also #hash would have to be defined accordingly.

Then you would still have the two different kinds of objects ("a" and :a)
but they would behave quite the same except for modifying methods.

While I think Symbol probably could use at least few of String's
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == "a" ?

Now, as I am writing this, I doubt that the advantages
of the unification are really worth doing it...

It depends on factors not known to me.

But now, I think I can understand the core-team's decision better.

Thanks for this excellent summary.

T.

···

On May 16, 5:44 am, "Sven Suska (enduro)" <sven71...@suska.org> wrote:

Well there is precendent, 2 == 2.0 and so on
On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === 'a' but not :a == 'a'

···

On 5/16/07, Trans <transfire@gmail.com> wrote:

On May 16, 5:44 am, "Sven Suska (enduro)" <sven71...@suska.org> wrote:
> Subclassing alone would not be enough, to solve the problem above,
> also, String#== and Symbol#== would have to be defined such that "a" == :a
> And also #hash would have to be defined accordingly.
>
> Then you would still have the two different kinds of objects ("a" and :a)
> but they would behave quite the same except for modifying methods.

While I think Symbol probably could use at least few of String's
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == "a" ?

Honestly I prefer to write

case s.to_s
   when 'a'

instead of
case s
     when 'a'

but the most explicit way to do this is maybe the most readable

case s
    when :a, 'a'

Cheers
Robert

P.S.
Tom is right that was an excellent resumé.
R

···

On 5/16/07, Logan Capaldo <logancapaldo@gmail.com> wrote:

On 5/16/07, Trans <transfire@gmail.com> wrote:
>
> On May 16, 5:44 am, "Sven Suska (enduro)" <sven71...@suska.org> wrote:
> > Subclassing alone would not be enough, to solve the problem above,
> > also, String#== and Symbol#== would have to be defined such that "a" == :a
> > And also #hash would have to be defined accordingly.
> >
> > Then you would still have the two different kinds of objects ("a" and :a)
> > but they would behave quite the same except for modifying methods.
>
> While I think Symbol probably could use at least few of String's
> manipulation methods, putting that aside, I wonder how it would effect
> things just to make :a == "a" ?
>
Well there is precendent, 2 == 2.0 and so on
On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === 'a' but not :a == 'a'

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

Hi --

> Subclassing alone would not be enough, to solve the problem above,
> also, String#== and Symbol#== would have to be defined such that "a" == :a
> And also #hash would have to be defined accordingly.
>
> Then you would still have the two different kinds of objects ("a" and :a)
> but they would behave quite the same except for modifying methods.

While I think Symbol probably could use at least few of String's
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == "a" ?

Well there is precendent, 2 == 2.0 and so on

With symbols being as integer-like as they are string-like, though,
it's really equally similar to:

   2 == :"2"

On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === 'a' but not :a == 'a'

I guess as long as :a === :a was still true, that might be a good way
to express the fact that "this is the string of which this symbol is a
case", or something like that.

David

···

On Wed, 16 May 2007, Logan Capaldo wrote:

On 5/16/07, Trans <transfire@gmail.com> wrote:

On May 16, 5:44 am, "Sven Suska (enduro)" <sven71...@suska.org> wrote:

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black\)
    (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf\)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

Hi --

>>
>> > Subclassing alone would not be enough, to solve the problem above,
>> > also, String#== and Symbol#== would have to be defined such that "a" ==
>> :a
>> > And also #hash would have to be defined accordingly.
>> >
>> > Then you would still have the two different kinds of objects ("a" and :a)
>> > but they would behave quite the same except for modifying methods.
>>
>> While I think Symbol probably could use at least few of String's
>> manipulation methods, putting that aside, I wonder how it would effect
>> things just to make :a == "a" ?
>>
> Well there is precendent, 2 == 2.0 and so on

With symbols being as integer-like as they are string-like, though,
it's really equally similar to:

   2 == :"2"

I don't think symbols are integer like. (I don't know that they are
especially string like either), but I'd be willing to bet a lot more
code in the wild would be broken if you removed Symbol#to_s vs.
removing Symbol#to_i.

Your example really ought to be

2 == :whatever_symbol_whose_to_i_results_in_2

···

On 5/16/07, dblack@wobblini.net <dblack@wobblini.net> wrote:

On Wed, 16 May 2007, Logan Capaldo wrote:
> On 5/16/07, Trans <transfire@gmail.com> wrote:
>> On May 16, 5:44 am, "Sven Suska (enduro)" <sven71...@suska.org> wrote:

David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black\)
    (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf\)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

This is the 'equivalence is defined by identity' idea again. I think
this is what David means by 'integer-like'. It is this property that
both fixnums and symbols share but that is *not* shared by strings.

Making '==' work with mixed operands of symbols and strings breaks that
idea and leads to the strange example that David gave (2 == :"2").

Gary Wright

···

On May 16, 2007, at 11:17 AM, Logan Capaldo wrote:

On 5/16/07, dblack@wobblini.net <dblack@wobblini.net> wrote:

With symbols being as integer-like as they are string-like, though,
it's really equally similar to:

   2 == :"2"

I don't think symbols are integer like.

Hi --

···

On Fri, 18 May 2007, Gary Wright wrote:

On May 16, 2007, at 11:17 AM, Logan Capaldo wrote:

On 5/16/07, dblack@wobblini.net <dblack@wobblini.net> wrote:

With symbols being as integer-like as they are string-like, though,
it's really equally similar to:

   2 == :"2"

I don't think symbols are integer like.

This is the 'equivalence is defined by identity' idea again. I think
this is what David means by 'integer-like'. It is this property that
both fixnums and symbols share but that is *not* shared by strings.

Yes, it's the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.

David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black\)
    (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf\)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

Frozen strings are immutable.

Paul

···

On Fri, May 18, 2007 at 03:17:01AM +0900, dblack@wobblini.net wrote:

Yes, it's the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.

But not immediate.

···

On 5/18/07, Paul Brannan <pbrannan@atdesk.com> wrote:

On Fri, May 18, 2007 at 03:17:01AM +0900, dblack@wobblini.net wrote:
> Yes, it's the immutable/immediate thing that symbols have in common
> with fixnums and that neither has in common with strings.

Frozen strings are immutable.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

> > Yes, it's the immutable/immediate thing that symbols have in common
> > with fixnums and that neither has in common with strings.
>
> Frozen strings are immutable.

But not immediate.

What about

%f{This is sooo cooooold} << "!"

TypeError: can't modify frozen string
Just an idea.

Robert

···

On 5/18/07, Rick DeNatale <rick.denatale@gmail.com> wrote:

On 5/18/07, Paul Brannan <pbrannan@atdesk.com> wrote:
> On Fri, May 18, 2007 at 03:17:01AM +0900, dblack@wobblini.net wrote:
--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

That's the immutable part, but

a = "abc".freeze
b = "abc".freeze
c = :abc
d = :abc
a.object_id => -606341628
b.object_id => -606347008
c.object_id => 343218
d.object_id => 343218

The key difference is that there's only one instance of a symbol with
a given string representation.

The shorthand way of saying this is that symbols, like fixnums are
immediate. Which is a sufficent but not necessary condition, it
crosses the line a bit in describing both the identity relationship
requirement AND the implementation.

Most normal objects are referenced at the C level by an internal value
which is a pointer to the objects state representation in memory.
Since objects are aligned at least on a word boudary, all normal
object pointers will have the 2 least significant bits as zero. They
will also be non-zero

A few objects are immediate which means that they are referenced at
the C level by a representation whose value is not a pointer. Fixnums
are represented by shifting the C representation left one bit and
turning on the low-order bit. False is represented by 0, True by 2,
and Nil by 4.

Ruby symbols are represented by a value computed by shifting the
symbols integer representation left 8 bits and setting the low-order
byte to 0xFF representation

As I said, it's not essential that symbols be immediate, for example
interning a string could create a Symbol instance which was frozen and
registered in a global symbol table, i.e. the multiton pattern, but
the current implementation no doubt has some advantages in either
low-level mechanism performance, supporting some niche in ruby legacy,
or both.

···

On 5/18/07, Robert Dober <robert.dober@gmail.com> wrote:

On 5/18/07, Rick DeNatale <rick.denatale@gmail.com> wrote:
> On 5/18/07, Paul Brannan <pbrannan@atdesk.com> wrote:
> > On Fri, May 18, 2007 at 03:17:01AM +0900, dblack@wobblini.net wrote:
> > > Yes, it's the immutable/immediate thing that symbols have in common
> > > with fixnums and that neither has in common with strings.
> >
> > Frozen strings are immutable.
>
> But not immediate.
>

What about

%f{This is sooo cooooold} << "!"

TypeError: can't modify frozen string
Just an idea.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

IPMS/USA Region 12 Coordinator
http://ipmsr12.denhaven2.com/

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

> > > > Yes, it's the immutable/immediate thing that symbols have in common
> > > > with fixnums and that neither has in common with strings.
> > >
> > > Frozen strings are immutable.
> >
> > But not immediate.
> >
>
> What about
>
> %f{This is sooo cooooold} << "!"
>
> TypeError: can't modify frozen string
> Just an idea.

That's the immutable part, but

a = "abc".freeze
b = "abc".freeze
c = :abc
d = :abc
a.object_id => -606341628
b.object_id => -606347008
c.object_id => 343218
d.object_id => 343218

The key difference is that there's only one instance of a symbol with
a given string representation.

Ah I see, I got confused, I did not understand the meaning of
immediate immediately ;).
Although theoretically the interpreter could create an immediate value for
%f{...} we would probably run out of address space :frowning:
<snip>

Cheers
Robert

···

On 5/18/07, Rick DeNatale <rick.denatale@gmail.com> wrote:

On 5/18/07, Robert Dober <robert.dober@gmail.com> wrote:
> On 5/18/07, Rick DeNatale <rick.denatale@gmail.com> wrote:
> > On 5/18/07, Paul Brannan <pbrannan@atdesk.com> wrote:
> > > On Fri, May 18, 2007 at 03:17:01AM +0900, dblack@wobblini.net wrote:

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

Perhaps it varies based on the Ruby version you're running; it's not like
that for me.

irb(main):006:0> :foo.object_id.to_s(16)
=> "39490e"
irb(main):007:0> RUBY_VERSION
=> "1.8.4"

I think a weaker requirement than 'immediate' is needed. A symbol can quite
happily be a regular object; we just need to ensure that there is always
only one symbol for a particular symbol character sequence.

Regards,

Brian.

···

On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:

Most normal objects are referenced at the C level by an internal value
which is a pointer to the objects state representation in memory.
Since objects are aligned at least on a word boudary, all normal
object pointers will have the 2 least significant bits as zero. They
will also be non-zero

A few objects are immediate which means that they are referenced at
the C level by a representation whose value is not a pointer. Fixnums
are represented by shifting the C representation left one bit and
turning on the low-order bit. False is represented by 0, True by 2,
and Nil by 4.

Ruby symbols are represented by a value computed by shifting the
symbols integer representation left 8 bits and setting the low-order
byte to 0xFF representation

> Ruby symbols are represented by a value computed by shifting the
> symbols integer representation left 8 bits and setting the low-order
> byte to 0xFF representation

Perhaps it varies based on the Ruby version you're running; it's not like
that for me.

irb(main):006:0> :foo.object_id.to_s(16)
=> "39490e"
irb(main):007:0> RUBY_VERSION
=> "1.8.4"

You can't really see the internal bit representations from ruby, since
they get manipulated before you see them. Much like the class of an
object reported by ruby isn't the same as the object pointed to by its
klass pointer at the C level.

And even if you could, I was talking about the integer representation
of the symbol, not the object_id.

Not to say that this doesn't change between versions of ruby. Which
is why it's carefully hidden from ruby code.

I think a weaker requirement than 'immediate' is needed. A symbol can quite
happily be a regular object; we just need to ensure that there is always
only one symbol for a particular symbol character sequence.

Yes, I said that, but the key issue for the subject of the current
thread is that Symbols aren't strings, they might have both a string
representation and an integer representation, but then so do integers,
and unlike Strings they have an essential requirement that equality
implies identity which is an accidental property of integers in the
range of Fixnum.

···

On 5/19/07, Brian Candler <B.Candler@pobox.com> wrote:

On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

AFAIK, the object_id is the in-memory pointer to the structure of the object
(if it's a material object), or is one of the special values:

- 0, 2 or 4 for false, true or nil

- (n<<1) | 1 for Fixnums

None of these is valid as a pointer to a memory location, so they can be
recognised immediately as special.

So in the above, :foo's object ID looks like a memory pointer to me. It
might not be, but then you'd need to guarantee that 39490e could not
possibly be a valid memory pointer for some regular object (and also be able
to recognise this by inspection, i.e. by looking at the bit pattern)

Regards,

Brian.

···

On Sun, May 20, 2007 at 10:18:22AM +0900, Rick DeNatale wrote:

On 5/19/07, Brian Candler <B.Candler@pobox.com> wrote:
>On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:

>> Ruby symbols are represented by a value computed by shifting the
>> symbols integer representation left 8 bits and setting the low-order
>> byte to 0xFF representation
>
>Perhaps it varies based on the Ruby version you're running; it's not like
>that for me.
>
>irb(main):006:0> :foo.object_id.to_s(16)
>=> "39490e"
>irb(main):007:0> RUBY_VERSION
>=> "1.8.4"

You can't really see the internal bit representations from ruby, since
they get manipulated before you see them. Much like the class of an
object reported by ruby isn't the same as the object pointed to by its
klass pointer at the C level.

And even if you could, I was talking about the integer representation
of the symbol, not the object_id.

Not starting with 1.8.5
VALUE
rb_obj_id(VALUE obj)
{
    /*
     * 32-bit VALUE space
     * MSB ------------------------ LSB
     * false 00000000000000000000000000000000
     * true 00000000000000000000000000000010
     * nil 00000000000000000000000000000100
     * undef 00000000000000000000000000000110
     * symbol ssssssssssssssssssssssss00001110
     * object oooooooooooooooooooooooooooooo00 = 0 (mod sizeof(RVALUE))
     * fixnum fffffffffffffffffffffffffffffff1

···

On 5/20/07, Brian Candler <B.Candler@pobox.com> wrote:

On Sun, May 20, 2007 at 10:18:22AM +0900, Rick DeNatale wrote:
> On 5/19/07, Brian Candler <B.Candler@pobox.com> wrote:
> >On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:
>
> >> Ruby symbols are represented by a value computed by shifting the
> >> symbols integer representation left 8 bits and setting the low-order
> >> byte to 0xFF representation
> >
> >Perhaps it varies based on the Ruby version you're running; it's not like
> >that for me.
> >
> >irb(main):006:0> :foo.object_id.to_s(16)
> >=> "39490e"
> >irb(main):007:0> RUBY_VERSION
> >=> "1.8.4"
>
> You can't really see the internal bit representations from ruby, since
> they get manipulated before you see them. Much like the class of an
> object reported by ruby isn't the same as the object pointed to by its
> klass pointer at the C level.
>
> And even if you could, I was talking about the integer representation
> of the symbol, not the object_id.

AFAIK, the object_id is the in-memory pointer to the structure of the object
(if it's a material object), or is one of the special values:

- 0, 2 or 4 for false, true or nil

- (n<<1) | 1 for Fixnums

     *
     * object_id space
     * LSB
     * false 00000000000000000000000000000000
     * true 00000000000000000000000000000010
     * nil 00000000000000000000000000000100
     * undef 00000000000000000000000000000110
     * symbol 000SSSSSSSSSSSSSSSSSSSSSSSSSSS0 S...S % A = 4
(S...S = s...s * A + 4)
     * object oooooooooooooooooooooooooooooo0 o...o % A = 0
     * fixnum fffffffffffffffffffffffffffffff1 bignum if required
     *
     * where A = sizeof(RVALUE)/4
     *
     * sizeof(RVALUE) is
     * 20 if 32-bit, double is 4-byte aligned
     * 24 if 32-bit, double is 8-byte aligned
     * 40 if 64-bit
     */
    if (TYPE(obj) == T_SYMBOL) {
        return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
    }
    if (SPECIAL_CONST_P(obj)) {
        return LONG2NUM((long)obj);
    }
    return (VALUE)((long)obj|FIXNUM_FLAG);
}

1.8.6 and 1.9 have the same code.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/