Get the real object in a Hash key

Hi, let's suppose this simple code in which I add internal attributes
to String instances and use such String objects as Hash keys:

···

------------------------------------------------
h = {}

k1 = "aaa"
k1.instance_variable_set :@name, "Aaa-011"

k2 = "bbb"
k2.instance_variable_set :@name, "Bbb-268"

h[k1] = "Hello"
h[k2] = "Bye"
------------------------------------------------

Now I want to lookup in the hash the element whose key matches "aaa"
(using String#eql?):

  h["aaa"]
  => "Hello"

But I don't want just to get the key associated value ("Hello"), but
also the key object itself (not the "aaa" I passed but k1 object) so I
can check its @name attribute. And I need it in a very efficient way.

However I've realized right now that it's not possible. The hash key
doesn't store the given key as a reference to such object:

-------------------------------------------
puts k1.object_id
=> 18140060

puts k2.object_id
=> 16245980

h.keys.each {|k| puts k.object_id}
=> 16182220
=> 20359940
------------------------------------------.

I've realized of it while writting this mail, so forget the previous
question. Now I have another question:

--------------------
myobject = MyCustomClass.new

@h = {}

@h[myobject] = "lalalala"
--------------------

In this case, will Ruby GC delete myobject? or will it remain alive as
it has been used as a key of a hash (which is not GC'd in a supposed
code)?

Thanks a lot.
--
Iñaki Baz Castillo
<ibc@aliax.net>

However I've realized right now that it's not possible. The hash key
doesn't store the given key as a reference to such object:

This is a special optimization for unfrozen Strings as Hash keys.

Now I have another question:

--------------------
myobject = MyCustomClass.new

@h = {}

@h[myobject] = "lalalala"
--------------------

In this case, will Ruby GC delete myobject? or will it remain alive as
it has been used as a key of a hash (which is not GC'd in a supposed
code)?

The key stays alive at least as long as the Hash instance.

Cheers

robert

···

On Fri, Apr 15, 2011 at 2:50 PM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

hi Iñaki,

  i may well not understand exactly what you need to do, and so be
oversimplifying, but could you do something similar to what Robert
suggested (but a bit simpler,) and just use an array as each key's
value? the header's original name could be added as the first element
of the array - something like this:

request = Hash.new{|key, value| key[value] = []}

request["FROM"] = ["fRoM", "sip:alice@xample.org"]

p hash["FROM"][0]

#=> "fRoM"

- j

···

--
Posted via http://www.ruby-forum.com/.

Oopss, if I freeze the string before inserting it as Hash key it
doesn't occur (I get some object_id) :slight_smile:
Same occurs if I use a class inheriting from String. Good to know!

Then I come back to my original question:

···

2011/4/15 Robert Klemme <shortcutter@googlemail.com>:

On Fri, Apr 15, 2011 at 2:50 PM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

However I've realized right now that it's not possible. The hash key
doesn't store the given key as a reference to such object:

This is a special optimization for unfrozen Strings as Hash keys.

----------------
k1 = "aaa"
k1.freeze

h = {}

h[k1] = "HELLO"
----------------

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Thanks a lot.

--
Iñaki Baz Castillo
<ibc@aliax.net>

Exactly. And you don't want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key. The simplest would be to define a Struct, e.g.

Value = Struct.new :name, :val

Then put this into the Hash as values

h[k1] = Value["a name", "HELLO"]

Kind regards

robert

···

On Fri, Apr 15, 2011 at 3:14 PM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

2011/4/15 Robert Klemme <shortcutter@googlemail.com>:

On Fri, Apr 15, 2011 at 2:50 PM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

However I've realized right now that it's not possible. The hash key
doesn't store the given key as a reference to such object:

This is a special optimization for unfrozen Strings as Hash keys.

Oopss, if I freeze the string before inserting it as Hash key it
doesn't occur (I get some object_id) :slight_smile:
Same occurs if I use a class inheriting from String. Good to know!

Then I come back to my original question:

----------------
k1 = "aaa"
k1.freeze

h = {}

h[k1] = "HELLO"
----------------

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Yes, that seems a good solution.

Thanks.

···

2011/4/15 Robert Klemme <shortcutter@googlemail.com>:

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. And you don't want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key. The simplest would be to define a Struct, e.g.

Value = Struct.new :name, :val

Then put this into the Hash as values

h[k1] = Value["a name", "HELLO"]

--
Iñaki Baz Castillo
<ibc@aliax.net>

Robert K. wrote in post #993000:

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. And you don't want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key....

Well you may want to do it -- that's why Hash#assoc exists. Hash keys
can be objects of any sort, and there are use cases for storing
nonsimple keys.

The reason there's no constant-time equivalent of Hash#assoc is
because hashing, by its very nature, cannot be reversed. There's no
method for it because one cannot possibly exist. It's not because one
should never be interested in the key object. Hash#assoc is there for
a reason.

Lispers will recognize assoc as relating to the Lisp function of the
same name which has exactly that use case: key/value pairs where the
key and the value matter as objects in their own right, apart from the
the hashing function result.

···

On Fri, Apr 15, 2011 at 3:14 PM, Iaki Baz Castillo <ibc@aliax.net>

--
Posted via http://www.ruby-forum.com/\.

I did not argue against complex keys. The issue is with *mutable*
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

Kind regards

robert

···

On Fri, Apr 15, 2011 at 4:47 PM, Kevin Mahler <kevin.mahler@yahoo.com> wrote:

Robert K. wrote in post #993000:

On Fri, Apr 15, 2011 at 3:14 PM, Iaki Baz Castillo <ibc@aliax.net>

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. And you don't want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key....

Well you may want to do it -- that's why Hash#assoc exists. Hash keys
can be objects of any sort, and there are use cases for storing
nonsimple keys.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert K. wrote in post #993026:

I did not argue against complex keys. The issue is with *mutable*
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

You said "And you don't want to do it." In fact doing it has its uses.
Mutable keys or not is totally irrelevant, especially when the data
was there before the hash was introduced, as in the original example.

*Of course* making repeated calls to Hash#assoc in order to update
stuff in the key would be stupid. That goes without saying. What would
the purpose of the hash be? If that was your only point then we agree,
although it was a vacuous point.

Also do you realize that an example tends to stand for something which
is not literally the example itself? He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values. Mutable or not is beside the point.

I notice this phenomenon a lot: undergeneralization. The String stands
for something. It's his key data. If it were a simple value then the
example wouldn't make sense in the first place. Gee, thanks for
telling us that we shouldn't stuff random shit into a simple value and
then use that as a hash key, whereupon we can't look up stuff in the
hash directly but must use Hash#assoc instead. Again, if that was your
point then we agree, albeit in the obvious and nearly information-free
sense. I'm sure we would also agree that cats would be a poor building
material for helicopters.

···

--
Posted via http://www.ruby-forum.com/\.

Robert K. wrote in post #993026:

I did not argue against complex keys. The issue is with *mutable*
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

You said "And you don't want to do it." In fact doing it has its uses.

Please do not quote out of context: that was referring to the example with a String instance used as a Hash key and stuffed with additional instance variables.

Mutable keys or not is totally irrelevant, especially when the data
was there before the hash was introduced, as in the original example.

The topic of key mutability is especially relevant for keys stored in a Hash. Of course mutations before storing are irrelevant. But if you change fields of an object which are part of the key (i.e. included in #hash and #eql?) you need to rehash in order for the Hash to do lookups properly.

Basically you can have two types of fields in an object used as a Hash key:

1. key properties (used in #hash and #eql?)

2. non key properties (neither used in #hash nor #eql?)

Type 1 properties need of course be part of the key and of course you need to know them to make any lookups.

Type 2 properties are irrelevant for lookups you can merely consider them being "associated with the key". This leads to a situation where you have one instance (per key) with the associated data and potentially many other instances which might or might not have these properties. If they are actually defined to be properties (either through attr_accessor or manually) you end up carrying around baggage which is not used most of the time.

Type 2 properties should rather go into another instance which should be stored as value. This also makes it much clearer what's going on. Splitting up associated data into properties of key objects and an instance stored in the Hash doesn't really make sense. Then we could as well store everything in the key instance and don't need the Hash at all.

*Of course* making repeated calls to Hash#assoc in order to update
stuff in the key would be stupid. That goes without saying. What would
the purpose of the hash be? If that was your only point then we agree,
although it was a vacuous point.

Why is the point vacuous? Apparently OP has / had some questions about these topics and what may look obvious to you might not to others.

Also do you realize that an example tends to stand for something which
is not literally the example itself? He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values. Mutable or not is beside the point.

Well, but we cannot read other people's minds. We have to take the example at face value. Stuffing additional data into a String is not a good idea and I am not sure whether that occurred to OP or not. So this might really be what he is attempting. In this case "stuffing the data into the key" was part of the example and it was nowhere expressed that this is a fact that could not be changed.

And btw, I did not recommend to duplicate that data in the mapped-to value. I specifically suggested to place it there exclusively.

I notice this phenomenon a lot: undergeneralization. The String stands
for something. It's his key data. If it were a simple value then the
example wouldn't make sense in the first place. Gee, thanks for
telling us that we shouldn't stuff random shit into a simple value and
then use that as a hash key, whereupon we can't look up stuff in the
hash directly but must use Hash#assoc instead. Again, if that was your
point then we agree, albeit in the obvious and nearly information-free
sense. I'm sure we would also agree that cats would be a poor building
material for helicopters.

As is rudeness for a community.

  robert

···

On 15.04.2011 19:39, Kevin Mahler wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

To clarify, my exact case is the following:

I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash, and each SIP request header
(i.e. "From: sip:alice@example.org") becomes an entry of the hash
(Request object) as follows:

- The key is "FROM" (capitalized).
- The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be "from", "From",
"frOM" and so).
So my parser adds an instance variable @real_name within the header
name string ("FROM").

When I do the lookup of a header in the Request object, I would like
also to retrieve the key's @real_name, but I've already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

  request["FROM"]
  => [ "sip:alice@xample.org ]

I would end with something like:

  request["FROM"]
  => Struct ( "From", [ "sip:alice@xample.org ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).

Thanks to both for your comments.

···

2011/4/15 Kevin Mahler <kevin.mahler@yahoo.com>:

He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values.

--
Iñaki Baz Castillo
<ibc@aliax.net>

> He has a key. It contains some
> data. It's not necessarily true that he should duplicate that data in
> the mapped-to values.

To clarify, my exact case is the following:

I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash, and each SIP request header
(i.e. "From: sip:alice@example.org") becomes an entry of the hash
(Request object) as follows:

- The key is "FROM" (capitalized).
- The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be "from", "From",
"frOM" and so).
So my parser adds an instance variable @real_name within the header
name string ("FROM").

When I do the lookup of a header in the Request object, I would like
also to retrieve the key's @real_name, but I've already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

request["FROM"]
=> [ "sip:alice@xample.org ]

I would end with something like:

request["FROM"]
=> Struct ( "From", [ "sip:alice@xample.org ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).

Thanks to both for your comments.

--
Iñaki Baz Castillo
<ibc@aliax.net>

You don't have to have a hash to implement a hash interface. How about
simply creating your own class that supports the interface you want, but
also the functionality you want. Something like this:

class Request

  Header = Struct.new :key , :value

  def self.parse(headers)
    request = Request.new
    headers.each_line do |header|
      key, value = header.split ": "
      request.add_header key , value.chomp
    end
    request
  end

  def initialize
    @headers = Hash.new
  end

  def add_header(key, value)
    @headers[key.upcase] = Header[key,value]
  end

  def (key)
    @headers[key][:value]
  end

  def original(key)
    @headers[key][:key]
  end

end

headers = <<HEADER
HEADER

request = Request.parse headers

request["FROM"] # => "sip:alice@example.org"
request.original "FROM" # => "frOM"

request["TO"] # => "sip:brad@example.org"
request.original "TO" # => "To"

···

On Sat, Apr 16, 2011 at 9:51 AM, Iñaki Baz Castillo <ibc@aliax.net> wrote:

2011/4/15 Kevin Mahler <kevin.mahler@yahoo.com>:

frOM: sip:alice@example.org
To: sip:brad@example.org

He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values.

To clarify, my exact case is the following:

Now it gets interesting. :slight_smile:

I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash,

Usually it's better to use composition instead of inheritance to achieve this. Now your SipRequest inherits *all* methods from Hash including some that you might not want users to be able to invoke.

and each SIP request header
(i.e. "From: sip:alice@example.org") becomes an entry of the hash
(Request object) as follows:

- The key is "FROM" (capitalized).
- The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be "from", "From",
"frOM" and so).

So, to sum it up: you want to have a class for SIP request which allows (efficient) header field access through using header name in any case spelling.

So my parser adds an instance variable @real_name within the header
name string ("FROM").

When I do the lookup of a header in the Request object, I would like
also to retrieve the key's @real_name, but I've already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

   request["FROM"]
   => [ "sip:alice@xample.org ]

I would end with something like:

   request["FROM"]
   => Struct ( "From", [ "sip:alice@xample.org ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).

Here's how I'd do it. First, I would start with the interface, maybe something like this

module SIP
   class Request
     def self.parse(io)
       # ...
     end

     # get a header field by symbol
     def (header_name_sym)
     end

     # return the real name used
     def header_name(header_name_sym)
     end
   end
end

Then I'd think how I could make that API work properly. For example two variants, error and default value:

module SIP
   class Request
     HdrInfo = Struct.new name, values
     DUMMY = HdrInfo[nil, .freeze].freeze
     LT = "\r\n".freeze

     def self.parse(io)
       hdr = {}

       io.each_line LT do |l|
         case l
         when /^([^:]+:\s*(.*)$/
           # too simplistic parsing!
           hdr[$1] = $2.split(/,/).each(&:strip!)
         when /^$/
           break
         else
           raise "Not a header line: %p" % l
         end
       end

       new(hdr)
     end

     def initialize(headers)
       @hdr = {}

       # assume hdr is String and values is parsed
       headers.each do |hdr, values|
         @hdr[normalize(hdr)] = HdrInfo[hdr, values]
       end
     end

     # get a header field by symbol
     def (header_name_sym)
       @hdr.fetch(normalize(header_name_sym)) do |k|
         DUMMY
       end.values
     end

     # return the real name used
     def header_name(header_name_sym)
       @hdr.fetch(normalize(header_name_sym)).do |k|
         raise ArgumentError,
           "Header not found %p" % header_name_sym
       end.name
     end

   private
     def normalize(h)
       /[A-Z]/ =~ h ? h.downcase : h).to_sym
     end
   end
end

Of course we could build the internal hash straight away during parsing. The main focus of the example was how to use the header once parsed.

Thanks to both for your comments.

You're welcome.

Kind regards

  robert

···

On 16.04.2011 16:51, Iñaki Baz Castillo wrote:

2011/4/15 Kevin Mahler<kevin.mahler@yahoo.com>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks to both. However the SIP parser is already done. I've coded it
at C level as a Ruby extension (similar to Mongrel HTTP parser which
returns a Hash instance). I can change it for generating a Hash object
rather than a custom SipRequest object, and then behave as both of you
suggest:

  class SipRequest
    def initialize(headers={})
      @headers = headers
    end
  end

I will consider it and also the suggested methods to handle header
names and values.

Thanks a lot.

···

2011/4/16 Robert Klemme <shortcutter@googlemail.com>:

I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash,

Usually it's better to use composition instead of inheritance to achieve
this. Now your SipRequest inherits *all* methods from Hash including some
that you might not want users to be able to invoke.

--
Iñaki Baz Castillo
<ibc@aliax.net>