“Joseph McDonald” joe@vpop.net wrote in message

news:67659612452.20020827105228@vpop.net…

Another datapoint:

Time to insert 1000000 words into a JudySL array: 6907858.0 usec.

Doing the same with a straight C program took 3 seconds. I think the

extra 3.9 seconds is spent in rubyland doing this:

I did a test of different 32 bit key to 32 bit data. If someone would add

more datastructures to this test, such as Judy, I’d be interested.

Below is the log I wrote during the test.

Mikkel

Comparison of skiplist, stl map, BTree and Hash table.
All tests are 32bit key, 32bit value maps with no duplicates.

Speed

Machine Pentium II 300MHz with 500MB SDRAM.

The size of dataset was chosen to avoid excessive trashing when having

several containers in the air simultanously.

The output is shown below.

The test run stores and retrieves

```
500000 unique keys
```

one run is with sequential numbers 0, 1,… 499,999

one run is with randome numbers (rand() * rand()) with duplicates stripped,

totalling

500,000 different values.

Surprisingly, all the maps seems to prefer the linear values.

But the relative performance between the maps differs signicantly between.

STL is generally a factor 3, 4 (insert,find) off the hashtable.

The BTree is twice as fast as the hashtable at insertion with linear keys,

and about the same in search speed. The hashtable is probably spending time

growing its bucket table to reduce collisions, but quality of hash also

matters.

With random keys, the BTree is 1.7,2.7 slower than the Hash.

Since the hash is constant time, it is faster for larger keys, but it also

uses more

memory. Both are friendly to disk cache. It is difficult to measure, but the

BTree

is probably faster than the hash for small sets. This will require a

different test.

Note that the BTree has a preferred nodesize around 14 elements both at

leafs and internally.

There suspected reasons:

- CPU cachelines 2) the cost of moving data during insertion and 3) each

node is searched

in a tight linear loop. The node could be larger if the search were binary,

but it is only

faster after at least 100 entries per node and at that point the factors 1),
- starts to set

in. For dumping the tree to disk, it would be possible to flatten nodes.

The skiplist does not perform consistently. It can be 8 times slower than

the

hashtable at find random keys, and 2 times slower than STL

(which in turn is 4 times slower than hash). For insert it is equal to STL

in insertion.

But with linear keys it inserts as fast as the Hash (where the BTree is

twice as fast).

It finds at half the speed of the hash and 1.6 off the BTree.

Compared to the STL map, the skiplist is very fast with 0.3 times the insert

time of overhead

and still twice as fast at finding.

When searching, it always has to visit a node to decide its key. Compare

this to the BTree,

which caches the keys of the childnodes in the parent node. This makes for a

very fast scan.

The BTree could save space by visiting the childnodes, but it would be

slower - like the skiplist.

The skiplist has no locality. The skiplist could be transposed to a BTree

like structure such

that each level is stored in one more linked buffers on disk. This would add

locality.

It may be interesting to see if this also works in memory - it could make

the skiplist as fast

or faster than the BTree without the complexity.

Code complexity

All collections are implemented by me,

except the STL map which is from Microsoft Visual Studio 7.0.

The BTree is very complex. It uses 100K of code - although this is covers

some testcode and various

optional behaviour such as topdown instead of bottom insertions etc.

The STL map is probably a red/black tree but this has not been investigated.

Both the BTree and the STL map are too complex to easily customize for

special purposes.

The Skiplist and the Hashtable both uses 3-400 lines of code and are both

easy to adapt

to special purposes.

Memory

The Hashtable does not take op many lines of code. It grows dynamically so

it can start small.

But it is not significantly faster for small datasets so some other

collections may be preferable

here. It has some overhead in that the empty hash with no entries still

keeps 10 32bit values for

handling its buffers. It can start with a small buffer of say 4 to 8

entries. Each entry consumes

4 32bit values. The hashbucket adds 50% by having twice the number of

entries, each using a

single 32bit value. Hence Each entry uses 6 32bit values. With an initial

size of 4 entries,

this amounts to 10 + 6 * 4 = 34 32bit values. This is about the same as the

BTree for the initial

storage - but it also uses two allocations which may have allocation

overhead.

When the Hashtable grows, it consitently adds 6 32bit values per entry,

which is expensive.

When the BTree grows, it uses 2 32bit values per leaf entry and some node

overoverhead, say 25%

and much less if nodes are chosen large. Due to the large branchfactor, the

size of the parent

nodes can be ignored. For binary tries there would be as many internal nodes

as leaf nodes.

Without exact calculations, lets to just estimate the overhead to be no more

than 50% in total.

This yields 3 32bit values per entry. However, BTrees may be half empty.

Estimating 75% usage,

this roughly becomes 4 32bit values per entry.

The skiplist which uses 2 32bit values per entry for key/value, plus 1.3

pointers per node.

It has only a single null pointer in its empty state. It needs a header

which is a least the

size of the smallest node. This translates into a smallest entry of 3 + 3 =

6 32bit values,

pluts the pointer to the handle. Since the header is chosen slightly larger

to avoid reallocation

immediately, this makes a total of 8 32bit values with a single entry.

This is 3 or 4 times less than than a BTree and a Hashtable initial entry.

A guess is that the STL map uses about 5 32bit values per node: key, value,

left, right, color.

Although the color could be stored as a tagged pointer reducing this to 4

32bit values.

An initial entry including the actual map variable will probably be about 8

32bit values,

and it will grow with about 4 32bit values per entry.

The BTree has the most cachefriendly nature because it keeps locality and

frequently used

paths down the tree has a good chance of a cache hit. The Hashtable shows

that storing data

in a single dynamically growing buffer also works well.

The performance tests does not tell how the BTree would work compared to the

hashtable for

real usage. Here all the values are visited. It is expected that the BTree

will perform better

on actual data access due to its cachefriendly nature. Likewise, it is

expected that

the skiplists and STL maps are not too kind on the cache, and some of this

probably also shows

in the performance numbers.

Data

dump with 500,000 linear keys

(never mind the dump saying random - it is linear)

## ···

=============================

This first measure creates a CPU independent time measure

and measures operations relative to the cost of creating a key

Time to create 5000000 keys

0.3 seconds, relative time 1.0

Time to insert 500000 random values into STL map

2.7 seconds, relative time 92.9

Time to find 500000 random values in STL map

1.4 seconds, relative time 47.3

Time to insert 500000 random values into skiplist

0.8 seconds, relative time 27.3

Time to find 500000 random values in skiplist

0.7 seconds, relative time 23.5

Time to insert 500000 random values into BTree

0.4 seconds, relative time 13.5

Time to find 500000 random values in BTree

0.4 seconds, relative time 14.5

Time to insert 500000 random values into hashtable (single-keyed)

0.7 seconds, relative time 25.6

Time to find 500000 random values in hashtable

0.4 seconds, relative time 12.1

# STL / Hash insert ratio: 3.6

STL / Hash find ratio: 3.9

BTree / Hash insert ratio: 0.5

BTree / Hash find ratio: 1.2

BTree / STL insert ratio: 0.1

BTree / STL find ratio: 0.3

SkipList / Hash insert ratio: 1.1

SkipList / Hash find ratio: 1.9

SkipList / BTree insert ratio: 2.0

SkipList / BTree find ratio: 1.6

SkipList / STL insert ratio: 0.3

SkipList / STL find ratio: 0.5

# dump with 500,000 unique random keys

Time to create 5000000 keys

0.3 seconds, relative time 1.0

Time to insert 500000 random values into STL map

3.0 seconds, relative time 103.6

Time to find 500000 random values in STL map

2.1 seconds, relative time 70.8

Time to insert 500000 random values into skiplist

3.3 seconds, relative time 115.3

Time to find 500000 random values in skiplist

4.0 seconds, relative time 139.5

Time to insert 500000 random values into BTree

1.5 seconds, relative time 53.2

Time to find 500000 random values in BTree

1.3 seconds, relative time 44.6

Time to insert 500000 random values into hashtable (single-keyed)

0.9 seconds, relative time 31.1

Time to find 500000 random values in hashtable

0.5 seconds, relative time 16.6

# STL / Hash insert ratio: 3.3

STL / Hash find ratio: 4.3

BTree / Hash insert ratio: 1.7

BTree / Hash find ratio: 2.7

BTree / STL insert ratio: 0.5

BTree / STL find ratio: 0.6

SkipList / Hash insert ratio: 3.7

SkipList / Hash find ratio: 8.4

SkipList / BTree insert ratio: 2.2

SkipList / BTree find ratio: 3.1

SkipList / STL insert ratio: 1.1

SkipList / STL find ratio: 2.0