Faster datastructure for lookups wanted

m94asr · 7 September 2006 21:55

Hi all,

maybe somebody can recommend me the right datastructure or
any other advice would be a big help.

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

xs.each{|x|
   if found = hash[x]
     #do sth.
   end
}

Thanks a lot!
-Armin

Matt_Todd · 7 September 2006 22:36

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

Would putting it into a database be a possible solution? I may be wrong, but
I'm pretty sure RDBMSes should be pretty quick to parse through and get
data. SQLite, PostgreSQL, or MySQL would be options.

Also, I'm thinking there may be a better method for finding the right
elements, though I'm not exactly sure what you're doing.

M.T.

Mauricio_Fernandez · 7 September 2006 22:43

It hardly gets faster than a Hash in Ruby.
You can also try a trie (Patricia tree if you have long keys and care about
space), or Judy arrays (http://rjudy.sourceforge.net), but I wouldn't expect
major performance gains (Judy::JudySL, being more specialized than Hash,
might have a chance)...

···

On Fri, Sep 08, 2006 at 06:55:12AM +0900, m94asr@gmail.com wrote:

maybe somebody can recommend me the right datastructure or
any other advice would be a big help.

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby

Paul_Lutus · 8 September 2006 01:07

m94asr@gmail.com wrote:

Hi all,

maybe somebody can recommend me the right datastructure or
any other advice would be a big help.

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

Any way to segment the data? IOW are 1*10^6 keys really needed in the same
hash? A multi-tiered hash might save a lot of lookup time if the data can
be broken into groups smaller than 1*10^6.

Example. If the strings are consistent about content, you could break the
search into 26 (or 52) sections by simply hashing the first letter of the
key strings, then hashing on the remainder of each string in the second
tier.

···

--
Paul Lutus
http://www.arachnoid.com

Robert_K1 · 8 September 2006 06:10

m94asr@gmail.com wrote:

Hi all,

  maybe somebody can recommend me the right datastructure or
any other advice would be a big help.

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

xs.each{|x|
   if found = hash
     #do sth. end
}

As others said already, a Hash is pretty much the fastest for the general case. How do your string keys look like? Maybe it is worth trying symbols instead of strings?

If you unveil a bit more about your application we might be able to come up with more suggestions.

Kind regards

robert

E_S · 7 September 2006 23:28

Mauricio Fernandez wrote:

maybe somebody can recommend me the right datastructure or
any other advice would be a big help.

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

It hardly gets faster than a Hash in Ruby.
You can also try a trie (Patricia tree if you have long keys and care
about
space),

A Trie optimised by cutting off unambiguous traversal would
be a definite possibility.

···

On Fri, Sep 08, 2006 at 06:55:12AM +0900, m94asr@gmail.com wrote:

or Judy arrays (http://rjudy.sourceforge.net), but I wouldn't
expect
major performance gains (Judy::JudySL, being more specialized than Hash,
might have a chance)...

--
Posted via http://www.ruby-forum.com/\.

Gregory_Seidman · 8 September 2006 01:19

} Mauricio Fernandez wrote:

···

On Fri, Sep 08, 2006 at 08:28:33AM +0900, Eero Saynatkari wrote:
} > On Fri, Sep 08, 2006 at 06:55:12AM +0900, m94asr@gmail.com wrote:
} >> maybe somebody can recommend me the right datastructure or
} >> any other advice would be a big help.
} >>
} >> My code spends most of its execution time doing lookups from
} >> a hashtable with about 1M keys. The keys are strings and the values
} >> are arrays of integers. Most of the time only of length 1.
} >>
} >> I do not care how long the construction of the datastructure takes,
} >> but the lookup should be as fast as possible.
} >
} > It hardly gets faster than a Hash in Ruby.
} > You can also try a trie (Patricia tree if you have long keys and care
} > about
} > space),
}
} A Trie optimised by cutting off unambiguous traversal would
} be a definite possibility.

There is a trie gem that implements a Patricia Trie.

http://gemjack.com/gems/trie-0.0.1/classes/Trie.html

Of course, a Patricia Trie assumes no a priori knowledge of your string
inputs. If you know something about your keys, you may be able to do better
with a hash of hashes (to however many layers is appropriate), splitting as
appropriate for your key space. For example, if you know that your keys are
IPv4 addresses that come in dotted quad notation (e.g. 127.0.0.1), you
could do better with (note: untested):

class SplittableHash
  def initialize(split)
    @split = split
    @root = {}
  end

  def [](key)
    key.split(@split).inject(@root) { |h,k| h[k] if h }
  end

  def []=(key, val)
    path = key.split(@split)
    key = path.pop
    path.inject(@root) { |h,k| h[k] ||= {} }[key] = val
  end

end

--Greg

Topic		Replies	Views
Quicker finding strings? Alternative for array, hash, set? ruby-talk	9	183	3 February 2009
Insertion and removal time for ruby datastructures ruby-talk	0	116	6 August 2009
Hash vs. array (dictionary lookup) ruby-talk	4	130	18 December 2002
Performance of ruby hashes ruby-talk	12	75	19 October 2006
Are Hash speeds documented? ruby-talk	4	98	26 February 2011

Faster datastructure for lookups wanted

Related topics