“Joseph McDonald” joe@vpop.net wrote in message
news:67659612452.20020827105228@vpop.net…
Another datapoint:
Time to insert 1000000 words into a JudySL array: 6907858.0 usec.
Doing the same with a straight C program took 3 seconds. I think the
extra 3.9 seconds is spent in rubyland doing this:
I did a test of different 32 bit key to 32 bit data. If someone would add
more datastructures to this test, such as Judy, I’d be interested.
Below is the log I wrote during the test.
Mikkel
Comparison of skiplist, stl map, BTree and Hash table.
All tests are 32bit key, 32bit value maps with no duplicates.
Speed
Machine Pentium II 300MHz with 500MB SDRAM.
The size of dataset was chosen to avoid excessive trashing when having
several containers in the air simultanously.
The output is shown below.
The test run stores and retrieves
500000 unique keys
one run is with sequential numbers 0, 1,… 499,999
one run is with randome numbers (rand() * rand()) with duplicates stripped,
totalling
500,000 different values.
Surprisingly, all the maps seems to prefer the linear values.
But the relative performance between the maps differs signicantly between.
STL is generally a factor 3, 4 (insert,find) off the hashtable.
The BTree is twice as fast as the hashtable at insertion with linear keys,
and about the same in search speed. The hashtable is probably spending time
growing its bucket table to reduce collisions, but quality of hash also
matters.
With random keys, the BTree is 1.7,2.7 slower than the Hash.
Since the hash is constant time, it is faster for larger keys, but it also
uses more
memory. Both are friendly to disk cache. It is difficult to measure, but the
BTree
is probably faster than the hash for small sets. This will require a
different test.
Note that the BTree has a preferred nodesize around 14 elements both at
leafs and internally.
There suspected reasons:
- CPU cachelines 2) the cost of moving data during insertion and 3) each
node is searched
in a tight linear loop. The node could be larger if the search were binary,
but it is only
faster after at least 100 entries per node and at that point the factors 1),
- starts to set
in. For dumping the tree to disk, it would be possible to flatten nodes.
The skiplist does not perform consistently. It can be 8 times slower than
the
hashtable at find random keys, and 2 times slower than STL
(which in turn is 4 times slower than hash). For insert it is equal to STL
in insertion.
But with linear keys it inserts as fast as the Hash (where the BTree is
twice as fast).
It finds at half the speed of the hash and 1.6 off the BTree.
Compared to the STL map, the skiplist is very fast with 0.3 times the insert
time of overhead
and still twice as fast at finding.
When searching, it always has to visit a node to decide its key. Compare
this to the BTree,
which caches the keys of the childnodes in the parent node. This makes for a
very fast scan.
The BTree could save space by visiting the childnodes, but it would be
slower - like the skiplist.
The skiplist has no locality. The skiplist could be transposed to a BTree
like structure such
that each level is stored in one more linked buffers on disk. This would add
locality.
It may be interesting to see if this also works in memory - it could make
the skiplist as fast
or faster than the BTree without the complexity.
Code complexity
All collections are implemented by me,
except the STL map which is from Microsoft Visual Studio 7.0.
The BTree is very complex. It uses 100K of code - although this is covers
some testcode and various
optional behaviour such as topdown instead of bottom insertions etc.
The STL map is probably a red/black tree but this has not been investigated.
Both the BTree and the STL map are too complex to easily customize for
special purposes.
The Skiplist and the Hashtable both uses 3-400 lines of code and are both
easy to adapt
to special purposes.
Memory
The Hashtable does not take op many lines of code. It grows dynamically so
it can start small.
But it is not significantly faster for small datasets so some other
collections may be preferable
here. It has some overhead in that the empty hash with no entries still
keeps 10 32bit values for
handling its buffers. It can start with a small buffer of say 4 to 8
entries. Each entry consumes
4 32bit values. The hashbucket adds 50% by having twice the number of
entries, each using a
single 32bit value. Hence Each entry uses 6 32bit values. With an initial
size of 4 entries,
this amounts to 10 + 6 * 4 = 34 32bit values. This is about the same as the
BTree for the initial
storage - but it also uses two allocations which may have allocation
overhead.
When the Hashtable grows, it consitently adds 6 32bit values per entry,
which is expensive.
When the BTree grows, it uses 2 32bit values per leaf entry and some node
overoverhead, say 25%
and much less if nodes are chosen large. Due to the large branchfactor, the
size of the parent
nodes can be ignored. For binary tries there would be as many internal nodes
as leaf nodes.
Without exact calculations, lets to just estimate the overhead to be no more
than 50% in total.
This yields 3 32bit values per entry. However, BTrees may be half empty.
Estimating 75% usage,
this roughly becomes 4 32bit values per entry.
The skiplist which uses 2 32bit values per entry for key/value, plus 1.3
pointers per node.
It has only a single null pointer in its empty state. It needs a header
which is a least the
size of the smallest node. This translates into a smallest entry of 3 + 3 =
6 32bit values,
pluts the pointer to the handle. Since the header is chosen slightly larger
to avoid reallocation
immediately, this makes a total of 8 32bit values with a single entry.
This is 3 or 4 times less than than a BTree and a Hashtable initial entry.
A guess is that the STL map uses about 5 32bit values per node: key, value,
left, right, color.
Although the color could be stored as a tagged pointer reducing this to 4
32bit values.
An initial entry including the actual map variable will probably be about 8
32bit values,
and it will grow with about 4 32bit values per entry.
The BTree has the most cachefriendly nature because it keeps locality and
frequently used
paths down the tree has a good chance of a cache hit. The Hashtable shows
that storing data
in a single dynamically growing buffer also works well.
The performance tests does not tell how the BTree would work compared to the
hashtable for
real usage. Here all the values are visited. It is expected that the BTree
will perform better
on actual data access due to its cachefriendly nature. Likewise, it is
expected that
the skiplists and STL maps are not too kind on the cache, and some of this
probably also shows
in the performance numbers.
Data
dump with 500,000 linear keys
(never mind the dump saying random - it is linear)
···
=============================
This first measure creates a CPU independent time measure
and measures operations relative to the cost of creating a key
Time to create 5000000 keys
0.3 seconds, relative time 1.0
Time to insert 500000 random values into STL map
2.7 seconds, relative time 92.9
Time to find 500000 random values in STL map
1.4 seconds, relative time 47.3
Time to insert 500000 random values into skiplist
0.8 seconds, relative time 27.3
Time to find 500000 random values in skiplist
0.7 seconds, relative time 23.5
Time to insert 500000 random values into BTree
0.4 seconds, relative time 13.5
Time to find 500000 random values in BTree
0.4 seconds, relative time 14.5
Time to insert 500000 random values into hashtable (single-keyed)
0.7 seconds, relative time 25.6
Time to find 500000 random values in hashtable
0.4 seconds, relative time 12.1
STL / Hash insert ratio: 3.6
STL / Hash find ratio: 3.9
BTree / Hash insert ratio: 0.5
BTree / Hash find ratio: 1.2
BTree / STL insert ratio: 0.1
BTree / STL find ratio: 0.3
SkipList / Hash insert ratio: 1.1
SkipList / Hash find ratio: 1.9
SkipList / BTree insert ratio: 2.0
SkipList / BTree find ratio: 1.6
SkipList / STL insert ratio: 0.3
SkipList / STL find ratio: 0.5
dump with 500,000 unique random keys
Time to create 5000000 keys
0.3 seconds, relative time 1.0
Time to insert 500000 random values into STL map
3.0 seconds, relative time 103.6
Time to find 500000 random values in STL map
2.1 seconds, relative time 70.8
Time to insert 500000 random values into skiplist
3.3 seconds, relative time 115.3
Time to find 500000 random values in skiplist
4.0 seconds, relative time 139.5
Time to insert 500000 random values into BTree
1.5 seconds, relative time 53.2
Time to find 500000 random values in BTree
1.3 seconds, relative time 44.6
Time to insert 500000 random values into hashtable (single-keyed)
0.9 seconds, relative time 31.1
Time to find 500000 random values in hashtable
0.5 seconds, relative time 16.6
STL / Hash insert ratio: 3.3
STL / Hash find ratio: 4.3
BTree / Hash insert ratio: 1.7
BTree / Hash find ratio: 2.7
BTree / STL insert ratio: 0.5
BTree / STL find ratio: 0.6
SkipList / Hash insert ratio: 3.7
SkipList / Hash find ratio: 8.4
SkipList / BTree insert ratio: 2.2
SkipList / BTree find ratio: 3.1
SkipList / STL insert ratio: 1.1
SkipList / STL find ratio: 2.0