Hello all,
I've run into some serious performance problems with Marshal.dump.
We use a ‘giant’ hash that contains some hundreds of keys that point to
arrays of Fixnums. When this hash is kept small (by dumping key-value
pairs of low importance), the resulting Marshal.dump of the hash takes
no more than a second (let’s say acceptable). When the size of the hash
is left unlimited, the time to dump the hash rises sky high!
(minutes). The following code replicates this problem, albeit the
process timing code works only under Linux (and maybe other Unices):
– marshal-timings.rb
def jiffies
File.open(’/proc/self/stat’).readline.split[13].to_i
end
def time_dump(hash_size, array_size)
hash = {}
hash_size.times {
arr = []
array_size.times { arr << rand(100000) }
hash[rand(100000)] = arr
}
tick_tock = jiffies
str = Marshal.dump(hash)
puts "Marshal.dump for #{hash_size} keys of #{array_size} items each
took #{jiffies - tick_tock} jiffies for #{str.size} bytes"
end
time_dump(100, 100)
time_dump(100, 1000)
time_dump(1000, 100)
time_dump(1000, 1000)
time_dump(10000, 10)
---- EOF
the results are astonishing:
ekarak@amnesia:~/tmp> ruby marshal-timings.rb
Marshal.dump for 100 keys of 100 items each took 24 jiffies for 44024 bytes
Marshal.dump for 100 keys of 1000 items each took 102 jiffies for 434718
bytes
Marshal.dump for 1000 keys of 100 items each took 1335 jiffies for
437321 bytes
Marshal.dump for 1000 keys of 1000 items each took 6494 jiffies for
4323494 bytes
Marshal.dump for 10000 keys of 10 items each took 12075 jiffies for
472664 bytes
Notice that while size of array is linear to the processing time, the
size Hash of hash causes exponential growth in processing time. It seems
as though the algorithm used to serialize hashes is far worse than O(n).
Any ideas/comments on this?
Elias
(Ruby/1.6.8 on Linux 2.4.19)