Hash vs. array (dictionary lookup)

Shannon_Fang · 18 December 2002 14:19

Hi Algorithmists,

I am writing a spell checker in ruby. The first step
is to load the dictionary into memory. After some
experiments, I found that the following simple code
worked quite nicely:

f=File.new(“dict.txt”)
text=f.sysread(File.size(“dict.txt”))
lexicon=text.split(/\n/)

On my 1.6G Pentium 4 running ruby 1.6.7, the total
time used to load the dictionary is less than 0.3
second. (Dictionary file is about 1Mb with 90K
records)

However, since the dictionary lookup operation will
be quite heavy, I am thinking of using a hash instead
of array. I tried the following code:

f=File.new(“dict.txt”)
text=f.sysread(File.size(“dict.txt”))
words=text.split(/\n/)
lexicon={}
words.each do |word|
lexicon[word]=0
end

Disaster! It took me about 15 seconds to load the
dictionary. Problem is that the #each method took too
much time.

I have 2 questions:

Is it worth to use hash instead of array with binary
search?
If I want to use hash, how to minimize the dictionary
load time? BTW, I tried to convert the dict.txt file into
a ruby command, i.e., dictionary={‘first’=>0,‘second’=>0,
…}, but system hangs when I tried to eval(text) after
sysread…

Thanks a lot!
Shannon

···

MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus

Gavin_Sinclair · 18 December 2002 14:34

However, since the dictionary lookup operation will
be quite heavy, I am thinking of using a hash instead
of array. I tried the following code:

f=File.new(“dict.txt”)
text=f.sysread(File.size(“dict.txt”))
words=text.split(/\n/)
lexicon={}
words.each do |word|
lexicon[word]=0
end

Disaster! It took me about 15 seconds to load the
dictionary. Problem is that the #each method took too
much time.

This code took 3 or 4 seconds to complete on my 900MHz PIII with 256Mb RAM.

h = {}
x = [“abd123”] * 90000
x.each_with_index do |y, i|
h[y + i.to_s] = 1
end
puts h.size

I have 2 questions:

Is it worth to use hash instead of array with binary
search?

I’d say so. Hashes are extremely fast in general, and I presume the same in
Ruby. Looking up strings in a hash is very fast [O(1)], and faster than a
binary search [O(log2 n)].

If I want to use hash, how to minimize the dictionary
load time? BTW, I tried to convert the dict.txt file into
a ruby command, i.e., dictionary={‘first’=>0,‘second’=>0,
…}, but system hangs when I tried to eval(text) after
sysread…

I wouldn’t bother with evaling like that. Prefer an elegant solution to a fast
solution (which I highly doubt eval is) until your program is complete enough
for you to start worrying about execution times.

Thanks a lot!
Shannon

Gavin

···

From: “Shannon Fang” xrfang@hotmail.com

David_Alan_Black1 · 18 December 2002 15:14

Hi –

···

On Wed, 18 Dec 2002, Shannon Fang wrote:

Hi Algorithmists,

I am writing a spell checker in ruby. The first step
is to load the dictionary into memory. After some
experiments, I found that the following simple code
worked quite nicely:

f=File.new(“dict.txt”)
text=f.sysread(File.size(“dict.txt”))
lexicon=text.split(/\n/)

On my 1.6G Pentium 4 running ruby 1.6.7, the total
time used to load the dictionary is less than 0.3
second. (Dictionary file is about 1Mb with 90K
records)

However, since the dictionary lookup operation will
be quite heavy, I am thinking of using a hash instead
of array. I tried the following code:

f=File.new(“dict.txt”)
text=f.sysread(File.size(“dict.txt”))
words=text.split(/\n/)
lexicon={}
words.each do |word|
lexicon[word]=0
end

Disaster! It took me about 15 seconds to load the
dictionary. Problem is that the #each method took too
much time.

That’s really puzzling. I ran the same script in .23s real time,
using a dict with about 38K words, on a 1.4GHz Pentium.

David

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Bulat_Ziganshin · 18 December 2002 15:18

Hello Shannon,

Wednesday, December 18, 2002, 5:19:50 PM, you wrote:

f=File.new(“dict.txt”)
text=f.sysread(File.size(“dict.txt”))
lexicon=text.split(/\n/)

if lexicon.size % 2 == 0
hash = Hash[ *lexicon ]
hash.update Hash[ *(lexicon[1…-1] + [0]) ]
else
hash = Hash[ *(lexicon + [0]) ]
hash.update Hash[ *lexicon[1…-1] ]
end

as alternative, see Marshal class:

run this program one time:
… # your code for reading lexicon from text file
File.open(“posterity”, “w+”) do |f|
Marshal.dump(lexicon, f)
end
use this code to restore lexicon:
lexicon = ‘’
File.open(“posterity”) do |f|
lexicon = Marshal.load(f)
end
… # using lexicon

···

–
Best regards,
Bulat mailto:bulatz@integ.ru

Bulat_Ziganshin · 18 December 2002 15:42

Hello dblack,

Wednesday, December 18, 2002, 6:14:15 PM, you wrote:

words.each do |word|
lexicon[word]=0
end

Disaster! It took me about 15 seconds to load the
dictionary. Problem is that the #each method took too
much time.

That’s really puzzling. I ran the same script in .23s real time,
using a dict with about 38K words, on a 1.4GHz Pentium.

hmm but in any case we need more functions to do fast computations
with hashes. i propose:

Array#to_hash as reverse to Hash#to_a
Hash#update(anArray) as equivalent to Hash#update(anArray.to_hash)
Hash#update(anArray,value) as equivalent to
anArray.each do |word|
aHash[word]=value
end
Hash.new(anArray,value) as equivalent to
aHash = {}
anArray.each do |word|
aHash[word]=value
end

···

–
Best regards,
Bulat mailto:bulatz@integ.ru

Topic		Replies	Views
Hash vs. array (dictionary lookup) ruby-talk	1	109	18 December 2002
Quicker finding strings? Alternative for array, hash, set? ruby-talk	9	184	3 February 2009
Hashes versus Arrays ruby-talk	10	134	8 February 2010
Binary Tree vs. Hash ruby-talk	46	115	7 June 2003
Hash load and GC ruby-talk	6	72	18 March 2003

Hash vs. array (dictionary lookup)

Related topics