Poor performance of Marshal.dump on Hashes

Hello all,

I've run into some serious performance problems with Marshal.dump. 

We use a ‘giant’ hash that contains some hundreds of keys that point to
arrays of Fixnums. When this hash is kept small (by dumping key-value
pairs of low importance), the resulting Marshal.dump of the hash takes
no more than a second (let’s say acceptable). When the size of the hash
is left unlimited, the time to dump the hash rises sky high!
(minutes). The following code replicates this problem, albeit the
process timing code works only under Linux (and maybe other Unices):

– marshal-timings.rb
def jiffies
File.open(’/proc/self/stat’).readline.split[13].to_i
end

def time_dump(hash_size, array_size)
hash = {}
hash_size.times {
arr = []
array_size.times { arr << rand(100000) }
hash[rand(100000)] = arr
}
tick_tock = jiffies
str = Marshal.dump(hash)
puts "Marshal.dump for #{hash_size} keys of #{array_size} items each
took #{jiffies - tick_tock} jiffies for #{str.size} bytes"
end

time_dump(100, 100)
time_dump(100, 1000)
time_dump(1000, 100)
time_dump(1000, 1000)
time_dump(10000, 10)
---- EOF

the results are astonishing:

ekarak@amnesia:~/tmp> ruby marshal-timings.rb
Marshal.dump for 100 keys of 100 items each took 24 jiffies for 44024 bytes
Marshal.dump for 100 keys of 1000 items each took 102 jiffies for 434718
bytes
Marshal.dump for 1000 keys of 100 items each took 1335 jiffies for
437321 bytes
Marshal.dump for 1000 keys of 1000 items each took 6494 jiffies for
4323494 bytes
Marshal.dump for 10000 keys of 10 items each took 12075 jiffies for
472664 bytes

Notice that while size of array is linear to the processing time, the
size Hash of hash causes exponential growth in processing time. It seems
as though the algorithm used to serialize hashes is far worse than O(n).
Any ideas/comments on this?

Elias
(Ruby/1.6.8 on Linux 2.4.19)

Notice that while size of array is linear to the processing time, the
size Hash of hash causes exponential growth in processing time.

Why do you want to have something linear ?

Guy Decoux

Hi,

Notice that while size of array is linear to the processing time, the
size Hash of hash causes exponential growth in processing time. It seems
as though the algorithm used to serialize hashes is far worse than O(n).
Any ideas/comments on this?

1.8 is much faster:

ruby 1.8.0 (2002-12-31) [i686-linux]
Marshal.dump for 100 keys of 100 items each took 1 jiffies for 44051 bytes
Marshal.dump for 100 keys of 1000 items each took 15 jiffies for 434819 bytes
Marshal.dump for 1000 keys of 100 items each took 15 jiffies for 436824 bytes
Marshal.dump for 1000 keys of 1000 items each took 162 jiffies for 4327665 bytes
Marshal.dump for 10000 keys of 10 items each took 18 jiffies for 473733 bytes

						matz.
···

In message “poor performance of Marshal.dump on Hashes” on 03/01/04, Elias Karakoulakis ekarak@softlab.ece.ntua.gr writes: