ANN: RJudy-0.1 - Judy Arrays for Ruby

All,

I’ve just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are
more-or-less direct wrappers of the current Judy API. The last is a
first cut at a more general replacement for Ruby’s Hash (i.e. with
arbitrary-type keys), based on the approach described in the Judy
application note “Scalable Hashing”.

The RDoc documentation for all this stuff is here:

http://www.knology.net/~lyle/rjudy

and the source tarball (which also includes those docs) is here:

http://www.knology.net/~lyle/rjudy-0.1.tar.gz

Very interesting stuff indeed (Judy, not my personal work). The
"JudyHash" class is still pretty rough by anyone’s measure (i.e.
probably lots of opportunities for optimization) but it’s already
consistently beating Ruby’s Hash (in my admittedly limited benchmarking
tests). If you’re interested in playing with this, I could use your help
in coming up with more challenging and meaningful benchmarks, as well as
examples. Also note that I haven’t implemented all of Hash’s methods
yet, so it’s not yet a “drop-in” replacement for Hash.

Have fun!

Lyle

Lyle,

i don’t recall if you ever expressed an opinion on the concept of a
Mixed Hash + Array type class. just to give you the quick idea if you
didn’t follow those old threads: (i’ll use ( … ) for the literal
bracket notation of this Array+Hash entity)

mix = ( ‘w’, :b => ‘x’, ‘y’, :d => ‘z’ )

from this you can see we have two elements with a specified key and two
without. concievably we could just treat this as an array:

puts mix[0] # ==> w
puts mix[1] # ==> x

or we could access it as a hash:

puts mix[:b] # ==> x
puts mix[:d] # ==> z

i did some work on this myself, but i don’t have the time right now to
pursue it. but in doing so, i discovered the crux of difficulty that
would need to be addressed: the ambiguity between an integer hash key
and an array index, but if this could be thought through and a solution
arrived at, i belive it would make for a more elegent and powerful type
of sequence class.

anyway, i bring this up simply becasue you’re in the process of
implementing this great new hash class, so i thought you might want to
give it some consideration.

~transami

p.s. if i get some time soon, i’ll give RJudy a whril. thanks for the
great work!

···

On Mon, 2002-08-26 at 23:41, Lyle Johnson wrote:

All,

I’ve just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are
more-or-less direct wrappers of the current Judy API. The last is a
first cut at a more general replacement for Ruby’s Hash (i.e. with
arbitrary-type keys), based on the approach described in the Judy
application note “Scalable Hashing”.

The RDoc documentation for all this stuff is here:

http://www.knology.net/~lyle/rjudy

and the source tarball (which also includes those docs) is here:

http://www.knology.net/~lyle/rjudy-0.1.tar.gz

Very interesting stuff indeed (Judy, not my personal work). The
“JudyHash” class is still pretty rough by anyone’s measure (i.e.
probably lots of opportunities for optimization) but it’s already
consistently beating Ruby’s Hash (in my admittedly limited benchmarking
tests). If you’re interested in playing with this, I could use your help
in coming up with more challenging and meaningful benchmarks, as well as
examples. Also note that I haven’t implemented all of Hash’s methods
yet, so it’s not yet a “drop-in” replacement for Hash.

Have fun!

Lyle


~transami

I've just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are

Thanks Lyle!

It works great. I find that insertion time is faster using judy,
retrieval time is about the same. I also find that memory usage is
lower with judy. Here is reading 1,000,000 unique words (12MB of
data):

JudySL takes 23MB
JudyHash takes 26MB
RubyHash takes 40MB

(I subtracted the baseline process size of 54MB that the words array
and ruby itself used).

Another datapoint:
Time to insert 1000000 words into a JudySL array: 6907858.0 usec.

Doing the same with a straight C program took 3 seconds. I think the
extra 3.9 seconds is spent in rubyland doing this:

words.each do |word|
    array[word] = word
end

All in all, this is *super-cool*!

thanks,
-joe

I’ve replicated most of your code for the last several hours to use
JudyL inside the internal hash (st_*).
This way, all of Ruby could potentially be faster (including method
lookups and of course builtin hashes).

It’s as rough as your code and then some more, but it already challenges
JudyHash’s speed in some situations. It is little more than JudyHash’s
code inside st_*, but hey, you can’t expect much more from such a quick
hack :-)!

BTW, while testing my hashes against JudyHash I found some flaws in
the way you do the admittedly simple benchmarks in words.rb:

  • GC.start should be done between each test to make things a little
    more fair, otherwise the first to run has the advantage as the GC
    will be called more frequently in the last ones…
  • your code uses calloc instead of alloc(), so the Ruby GC heuristics
    don’t work and no GC is called inside JudyHash
  • in Ruby hashes, a String key is duplicated if it isn’t used in
    the hash already; JudyHash doesn’t do that. This is why my Hash seemed
    to suck badly against JudyHash (9us against 3us to insert, about the
    same, 2.5us, to lookup) in words.rb, for the keys are new each time,
    and the associated values are never updated. This means a new String
    object is created in each insertion… In my hash, insertion
    will be much faster when the key doesn’t need to be replicated.

To test this behavior even /usr/share/dict/word could work…
head -10000 /usr/share/dict/word > /tmp/data
cat /tmp/data /tmp/data /tmp/data /tmp/data > /tmp/data2
cat /tmp/data2 /tmp/data2 /tmp/data2 /tmp/data2 > /tmp/data3
for n in /tmp/data*; do ./ruby -I/usr/local/lib/site_ruby/1.6/i386_linux
words.rb $n; done

I’m attaching a diff, please be indulgent: it’s about 4:30AM in my locale
and I’m really in need of some sleep. Got no time to find a place to
upload all this, plus it’s small…

Please take a look at the small attached patch…
I’ve had it build in 1.7.2 and I have had it running for some time,
seemingly OK, but I’m too tired now to do unit testing :slight_smile:

Some results here:

batsman@kodos:~/src/cvs/ruby-judy$ wc /tmp/data*
10000 10000 122880 /tmp/data
40000 40000 491520 /tmp/data2
160000 160000 1966080 /tmp/data3
640000 640000 7864320 /tmp/data4
850000 850000 10444800 total
doneman@kodos:~/src/cvs/ruby-judy$ for n in /tmp/data*; do ./ruby -1@1PI1@ /usr/local/lib/site_ruby/1.6/i386-linux/jwords2.rb $n;
Time to insert 10000 words into a JudySL array: 28592 usec.
=> Average insertion time: 2.8592 usec/insertion.
Time to look up 10000 words in a JudySL array: 17007.0 usec.
=> Average lookup time: 1.7007 usec/word.

ruby-judy-0.0.1.patch (13 KB)

···

On Tue, Aug 27, 2002 at 02:41:19PM +0900, Lyle Johnson wrote:

All,

I’ve just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are
more-or-less direct wrappers of the current Judy API. The last is a
first cut at a more general replacement for Ruby’s Hash (i.e. with
arbitrary-type keys), based on the approach described in the Judy
application note “Scalable Hashing”.

The RDoc documentation for all this stuff is here:

http://www.knology.net/~lyle/rjudy

and the source tarball (which also includes those docs) is here:

http://www.knology.net/~lyle/rjudy-0.1.tar.gz

Very interesting stuff indeed (Judy, not my personal work). The
“JudyHash” class is still pretty rough by anyone’s measure (i.e.
probably lots of opportunities for optimization) but it’s already
consistently beating Ruby’s Hash (in my admittedly limited benchmarking
tests). If you’re interested in playing with this, I could use your help
in coming up with more challenging and meaningful benchmarks, as well as
examples. Also note that I haven’t implemented all of Hash’s methods
yet, so it’s not yet a “drop-in” replacement for Hash.

==================
Time to insert 10000 words into a JudyHash: 53110.99999999999 usec.
=> Average insertion time: 5.3111 usec/insertion.
Time to look up 10000 words in a JudyHash: 16263.0 usec.
=> Average lookup time: 1.6263 usec/word.

==================
Time to insert 10000 words into a Ruby Hash: 30890 usec.
=> Average insertion time: 3.089 usec/insertion.
Time to look up 10000 words in a Ruby Hash: 16794.0 usec.
=> Average lookup time: 1.6794 usec/word.
Time to insert 40000 words into a JudySL array: 91143.0 usec.
=> Average insertion time: 2.278575 usec/insertion.
Time to look up 40000 words in a JudySL array: 63044.0 usec.
=> Average lookup time: 1.5761 usec/word.

==================
Time to insert 40000 words into a JudyHash: 93122.00000000004 usec.
=> Average insertion time: 2.328050000000001 usec/insertion.
Time to look up 40000 words in a JudyHash: 80944.0 usec.
=> Average lookup time: 2.0236 usec/word.

==================
Time to insert 40000 words into a Ruby Hash: 101690.0 usec.
=> Average insertion time: 2.54225 usec/insertion.
Time to look up 40000 words in a Ruby Hash: 70016.0 usec.
=> Average lookup time: 1.7504 usec/word.
Time to insert 160000 words into a JudySL array: 352922.0000000001 usec.
=> Average insertion time: 2.205762500000001 usec/insertion.
Time to look up 160000 words in a JudySL array: 256844 usec.
=> Average lookup time: 1.605275 usec/word.

==================
Time to insert 160000 words into a JudyHash: 399579.0 usec.
=> Average insertion time: 2.49736875 usec/insertion.
Time to look up 160000 words in a JudyHash: 353830.0 usec.
=> Average lookup time: 2.2114375 usec/word.

==================
Time to insert 160000 words into a Ruby Hash: 397741.0 usec.
=> Average insertion time: 2.48588125 usec/insertion.
Time to look up 160000 words in a Ruby Hash: 285617.0 usec.
=> Average lookup time: 1.78510625 usec/word.
Time to insert 640000 words into a JudySL array: 1390958.0 usec.
=> Average insertion time: 2.173371875 usec/insertion.
Time to look up 640000 words in a JudySL array: 1023863.0 usec.
=> Average lookup time: 1.5997859375 usec/word.

==================
Time to insert 640000 words into a JudyHash: 1636917.0 usec.
=> Average insertion time: 2.5576828125 usec/insertion.
Time to look up 640000 words in a JudyHash: 1491516 usec.
=> Average lookup time: 2.33049375 usec/word.

==================
Time to insert 640000 words into a Ruby Hash: 1741098.0 usec.
=> Average insertion time: 2.720465625 usec/insertion.
Time to look up 640000 words in a Ruby Hash: 1309702 usec.
=> Average lookup time: 2.046409375000001 usec/word.
batsman@kodos:~/src/cvs/ruby-judy$


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

     Why use Windows when you can have air conditioning?
     Why use Windows, when you can leave through the door?
-- Konrad Blum

[Joe]
Thanks Lyle!

It works great. I find that insertion time is faster using judy,
retrieval time is about the same. I also find that memory usage is
lower with judy. Here is reading 1,000,000 unique words (12MB of
data):

JudySL takes 23MB
JudyHash takes 26MB
RubyHash takes 40MB

(I subtracted the baseline process size of 54MB that the words array
and ruby itself used).

Another datapoint:
Time to insert 1000000 words into a JudySL array: 6907858.0 usec.

Doing the same with a straight C program took 3 seconds. I think the
extra 3.9 seconds is spent in rubyland doing this:

words.each do |word|
    array[word] = word
end

All in all, this is *super-cool*!

My thanks too, Lyle,

Joe:

I would be interested in knowing a little more about your data, such
as average string length. I am presently looking into tuning JudySL
for shorter length string (~1..15 chars). I am trying to get hold
of some 'typical' data to do comparisons. BTW, try long strings such
as a sentence and, of course, run yourself out of memory. This is
where JudySL does well compared with every thing else.

I am also looking to see why so many people are having trouble porting
JudySL to other machines. The basic problem is the need for the
define -DJU_LINUX in the compile line (even after all the stuff
under it is removed from JudySL.c) because it is used in JudyPrivate.h
to specify that it is a little "Endian" (Intel/Dec) machine.
Big Endian machines to not need the byte swapping code/instruction.

Please, no flames, I know its a bad way to program. We had a 2 week
window to get Judy into Open Source, or maybe loose the opportunity
altogether. So it went out with just the 6 platforms and a hacked up
./configure script. I/we will fix Judy to be as platform independent
as possible soon. Then I/we can get it on all platforms with a little
twik hear and there.

BTW, there is nothing inherent in Judy that is specific to anything
until you do the final hardware/compiler tunes -- such as some
have instruction support for some of the algorithms used in Judy.
An example would be a bit counting instruction. Judy needs to
count bits in a word somewhat often -- but not such that it kills you
if you don't have a bit counting instruction.

Doug Baskins <doug@sourcejudy.com>

“Joseph McDonald” joe@vpop.net wrote in message
news:67659612452.20020827105228@vpop.net

Another datapoint:
Time to insert 1000000 words into a JudySL array: 6907858.0 usec.

Doing the same with a straight C program took 3 seconds. I think the
extra 3.9 seconds is spent in rubyland doing this:

I did a test of different 32 bit key to 32 bit data. If someone would add
more datastructures to this test, such as Judy, I’d be interested.

Below is the log I wrote during the test.

Mikkel

Comparison of skiplist, stl map, BTree and Hash table. All tests are 32bit key, 32bit value maps with no duplicates.

Speed

Machine Pentium II 300MHz with 500MB SDRAM.
The size of dataset was chosen to avoid excessive trashing when having
several containers in the air simultanously.

The output is shown below.
The test run stores and retrieves

500000 unique keys

one run is with sequential numbers 0, 1,… 499,999
one run is with randome numbers (rand() * rand()) with duplicates stripped,
totalling
500,000 different values.

Surprisingly, all the maps seems to prefer the linear values.
But the relative performance between the maps differs signicantly between.

STL is generally a factor 3, 4 (insert,find) off the hashtable.

The BTree is twice as fast as the hashtable at insertion with linear keys,
and about the same in search speed. The hashtable is probably spending time
growing its bucket table to reduce collisions, but quality of hash also
matters.

With random keys, the BTree is 1.7,2.7 slower than the Hash.
Since the hash is constant time, it is faster for larger keys, but it also
uses more
memory. Both are friendly to disk cache. It is difficult to measure, but the
BTree
is probably faster than the hash for small sets. This will require a
different test.
Note that the BTree has a preferred nodesize around 14 elements both at
leafs and internally.
There suspected reasons:

  1. CPU cachelines 2) the cost of moving data during insertion and 3) each
    node is searched
    in a tight linear loop. The node could be larger if the search were binary,
    but it is only
    faster after at least 100 entries per node and at that point the factors 1),
  2. starts to set
    in. For dumping the tree to disk, it would be possible to flatten nodes.

The skiplist does not perform consistently. It can be 8 times slower than
the
hashtable at find random keys, and 2 times slower than STL
(which in turn is 4 times slower than hash). For insert it is equal to STL
in insertion.
But with linear keys it inserts as fast as the Hash (where the BTree is
twice as fast).
It finds at half the speed of the hash and 1.6 off the BTree.
Compared to the STL map, the skiplist is very fast with 0.3 times the insert
time of overhead
and still twice as fast at finding.
When searching, it always has to visit a node to decide its key. Compare
this to the BTree,
which caches the keys of the childnodes in the parent node. This makes for a
very fast scan.
The BTree could save space by visiting the childnodes, but it would be
slower - like the skiplist.
The skiplist has no locality. The skiplist could be transposed to a BTree
like structure such
that each level is stored in one more linked buffers on disk. This would add
locality.
It may be interesting to see if this also works in memory - it could make
the skiplist as fast
or faster than the BTree without the complexity.

Code complexity

All collections are implemented by me,
except the STL map which is from Microsoft Visual Studio 7.0.

The BTree is very complex. It uses 100K of code - although this is covers
some testcode and various
optional behaviour such as topdown instead of bottom insertions etc.
The STL map is probably a red/black tree but this has not been investigated.
Both the BTree and the STL map are too complex to easily customize for
special purposes.

The Skiplist and the Hashtable both uses 3-400 lines of code and are both
easy to adapt
to special purposes.

Memory

The Hashtable does not take op many lines of code. It grows dynamically so
it can start small.
But it is not significantly faster for small datasets so some other
collections may be preferable
here. It has some overhead in that the empty hash with no entries still
keeps 10 32bit values for
handling its buffers. It can start with a small buffer of say 4 to 8
entries. Each entry consumes
4 32bit values. The hashbucket adds 50% by having twice the number of
entries, each using a
single 32bit value. Hence Each entry uses 6 32bit values. With an initial
size of 4 entries,
this amounts to 10 + 6 * 4 = 34 32bit values. This is about the same as the
BTree for the initial
storage - but it also uses two allocations which may have allocation
overhead.
When the Hashtable grows, it consitently adds 6 32bit values per entry,
which is expensive.
When the BTree grows, it uses 2 32bit values per leaf entry and some node
overoverhead, say 25%
and much less if nodes are chosen large. Due to the large branchfactor, the
size of the parent
nodes can be ignored. For binary tries there would be as many internal nodes
as leaf nodes.
Without exact calculations, lets to just estimate the overhead to be no more
than 50% in total.
This yields 3 32bit values per entry. However, BTrees may be half empty.
Estimating 75% usage,
this roughly becomes 4 32bit values per entry.

The skiplist which uses 2 32bit values per entry for key/value, plus 1.3
pointers per node.
It has only a single null pointer in its empty state. It needs a header
which is a least the
size of the smallest node. This translates into a smallest entry of 3 + 3 =
6 32bit values,
pluts the pointer to the handle. Since the header is chosen slightly larger
to avoid reallocation
immediately, this makes a total of 8 32bit values with a single entry.
This is 3 or 4 times less than than a BTree and a Hashtable initial entry.

A guess is that the STL map uses about 5 32bit values per node: key, value,
left, right, color.
Although the color could be stored as a tagged pointer reducing this to 4
32bit values.
An initial entry including the actual map variable will probably be about 8
32bit values,
and it will grow with about 4 32bit values per entry.

The BTree has the most cachefriendly nature because it keeps locality and
frequently used
paths down the tree has a good chance of a cache hit. The Hashtable shows
that storing data
in a single dynamically growing buffer also works well.
The performance tests does not tell how the BTree would work compared to the
hashtable for
real usage. Here all the values are visited. It is expected that the BTree
will perform better
on actual data access due to its cachefriendly nature. Likewise, it is
expected that
the skiplists and STL maps are not too kind on the cache, and some of this
probably also shows
in the performance numbers.

Data

dump with 500,000 linear keys
(never mind the dump saying random - it is linear)

···

=============================
This first measure creates a CPU independent time measure
and measures operations relative to the cost of creating a key

Time to create 5000000 keys
0.3 seconds, relative time 1.0
Time to insert 500000 random values into STL map
2.7 seconds, relative time 92.9
Time to find 500000 random values in STL map
1.4 seconds, relative time 47.3
Time to insert 500000 random values into skiplist
0.8 seconds, relative time 27.3
Time to find 500000 random values in skiplist
0.7 seconds, relative time 23.5
Time to insert 500000 random values into BTree
0.4 seconds, relative time 13.5
Time to find 500000 random values in BTree
0.4 seconds, relative time 14.5
Time to insert 500000 random values into hashtable (single-keyed)
0.7 seconds, relative time 25.6
Time to find 500000 random values in hashtable
0.4 seconds, relative time 12.1


STL / Hash insert ratio: 3.6
STL / Hash find ratio: 3.9
BTree / Hash insert ratio: 0.5
BTree / Hash find ratio: 1.2
BTree / STL insert ratio: 0.1
BTree / STL find ratio: 0.3
SkipList / Hash insert ratio: 1.1
SkipList / Hash find ratio: 1.9
SkipList / BTree insert ratio: 2.0
SkipList / BTree find ratio: 1.6
SkipList / STL insert ratio: 0.3
SkipList / STL find ratio: 0.5

dump with 500,000 unique random keys

Time to create 5000000 keys
0.3 seconds, relative time 1.0
Time to insert 500000 random values into STL map
3.0 seconds, relative time 103.6
Time to find 500000 random values in STL map
2.1 seconds, relative time 70.8
Time to insert 500000 random values into skiplist
3.3 seconds, relative time 115.3
Time to find 500000 random values in skiplist
4.0 seconds, relative time 139.5
Time to insert 500000 random values into BTree
1.5 seconds, relative time 53.2
Time to find 500000 random values in BTree
1.3 seconds, relative time 44.6
Time to insert 500000 random values into hashtable (single-keyed)
0.9 seconds, relative time 31.1
Time to find 500000 random values in hashtable
0.5 seconds, relative time 16.6


STL / Hash insert ratio: 3.3
STL / Hash find ratio: 4.3
BTree / Hash insert ratio: 1.7
BTree / Hash find ratio: 2.7
BTree / STL insert ratio: 0.5
BTree / STL find ratio: 0.6
SkipList / Hash insert ratio: 3.7
SkipList / Hash find ratio: 8.4
SkipList / BTree insert ratio: 2.2
SkipList / BTree find ratio: 3.1
SkipList / STL insert ratio: 1.1
SkipList / STL find ratio: 2.0

well no one probably read the related post on a mixed Array + Hash
class, but just the same i thought of one way to make a distinction
between using the same object as both an Array and a Hash, by using
differnt reference brackets depending on how you want to treat it. For
an array we say:

a = [ ‘x’, ‘y’, ‘z’ ]

and use the same brackets to get access to the elements:

puts a[1] # ==> y

but funny for a hash:

h = { :a=>‘x’, :b=>‘y’, :c=>‘z’ }

we use the same brackets as the array’s reference:

puts h[:b] # ==> y

why don’t we use h{:b}, i wonder? b/c if we did then:

m = ( ‘x’, :b=>‘y’, ‘z’ )

could be treated distinctly based on the brackets used:

puts m[1] # ==> y (treat as array)
puts m{:b} # ==> y (treat as hash)

just some musings.

~transami

···

On Tue, 2002-08-27 at 00:19, Tom Sawyer wrote:

Lyle,

i don’t recall if you ever expressed an opinion on the concept of a
Mixed Hash + Array type class. just to give you the quick idea if you
didn’t follow those old threads: (i’ll use ( … ) for the literal
bracket notation of this Array+Hash entity)

mix = ( ‘w’, :b => ‘x’, ‘y’, :d => ‘z’ )

from this you can see we have two elements with a specified key and two
without. concievably we could just treat this as an array:

puts mix[0] # ==> w
puts mix[1] # ==> x

or we could access it as a hash:

puts mix[:b] # ==> x
puts mix[:d] # ==> z

i did some work on this myself, but i don’t have the time right now to
pursue it. but in doing so, i discovered the crux of difficulty that
would need to be addressed: the ambiguity between an integer hash key
and an array index, but if this could be thought through and a solution
arrived at, i belive it would make for a more elegent and powerful type
of sequence class.

anyway, i bring this up simply becasue you’re in the process of
implementing this great new hash class, so i thought you might want to
give it some consideration.

~transami

p.s. if i get some time soon, i’ll give RJudy a whril. thanks for the
great work!

On Mon, 2002-08-26 at 23:41, Lyle Johnson wrote:

All,

I’ve just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are
more-or-less direct wrappers of the current Judy API. The last is a
first cut at a more general replacement for Ruby’s Hash (i.e. with
arbitrary-type keys), based on the approach described in the Judy
application note “Scalable Hashing”.

The RDoc documentation for all this stuff is here:

http://www.knology.net/~lyle/rjudy

and the source tarball (which also includes those docs) is here:

http://www.knology.net/~lyle/rjudy-0.1.tar.gz

Very interesting stuff indeed (Judy, not my personal work). The
“JudyHash” class is still pretty rough by anyone’s measure (i.e.
probably lots of opportunities for optimization) but it’s already
consistently beating Ruby’s Hash (in my admittedly limited benchmarking
tests). If you’re interested in playing with this, I could use your help
in coming up with more challenging and meaningful benchmarks, as well as
examples. Also note that I haven’t implemented all of Hash’s methods
yet, so it’s not yet a “drop-in” replacement for Hash.

Have fun!

Lyle


~transami


~transami

All,

I’ve just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are
more-or-less direct wrappers of the current Judy API. The last is a
first cut at a more general replacement for Ruby’s Hash (i.e. with
arbitrary-type keys), based on the approach described in the Judy
application note “Scalable Hashing”.
[…]
I’ve replicated most of your code for the last several hours to use
JudyL inside the internal hash (st_*).
This way, all of Ruby could potentially be faster (including method
lookups and of course builtin hashes).

I’m attaching a diff, please be indulgent: it’s about 4:30AM in my locale
and I’m really in need of some sleep. Got no time to find a place to
upload all this, plus it’s small…

Please take a look at the small attached patch…
I’ve had it build in 1.7.2 and I have had it running for some time,
seemingly OK, but I’m too tired now to do unit testing :slight_smile:

I later found some bugs in my plug-in replacement for Hash. I have
another patch which apparently fixes everything. It passes the same
tests JudyHash does and some others, as st_* is used for everything and
Ruby seems to run fine now.

ruby-judy-0.0.1-0.0.1a.patch (1.83 KB)

···

On Wed, Aug 28, 2002 at 11:25:10AM +0900, Mauricio Fernández wrote:

On Tue, Aug 27, 2002 at 02:41:19PM +0900, Lyle Johnson wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Ok, I’m just uploading the new version of the kernel, v1.3.33, also
known as “the buggiest kernel ever”.
– Linus Torvalds

I’ve just uploaded RJudy, an extension module that provides a Ruby
interface to the Judy arrays library. This extension exposes four new
classes to Ruby (Judy1, JudyL, JudySL and JudyHash). The first three are

Thanks Lyle!

It works great. I find that insertion time is faster using judy,
retrieval time is about the same. I also find that memory usage is
lower with judy. Here is reading 1,000,000 unique words (12MB of
data):

JudySL takes 23MB
JudyHash takes 26MB
RubyHash takes 40MB

Bad news everyone…
This, and insertion being faster with Judy*, is an illusion :frowning:
Read below.

(I subtracted the baseline process size of 54MB that the words array
and ruby itself used).

Another datapoint:
Time to insert 1000000 words into a JudySL array: 6907858.0 usec.

Doing the same with a straight C program took 3 seconds. I think the
extra 3.9 seconds is spent in rubyland doing this:

words.each do |word|
array[word] = word
end

A regular Hash duplicates the key (word), so the following works
key = “a key”
h = {}
h[key] = “hi”
key[0] = “0”
h[“a key”] ==> “hi”
If the key were not cloned, you’d have to access the data with
h[“0 key”].
The key isn’t cloned if it already present in the hash. Try with 1000000
insertions of only 10000 different words.

JudyHash seems faster than the Hash based on my new st_* functions
(which I’ll call Hash’) and the standard Hash only because of this.
Note it also accounts for the difference in the memory usage…

If I modify hash.c so that it doesn’t duplicate the key, a number of
interesting things happen:

  • Hash’ is slightly faster than JudyHash in lookup and
    insertion, but this depends on the duplication/number of keys
  • standard Hash crushes both JudyHash and Hash’
  • things go a little better for JudyHash and Hash’ if there’s a lot of
    data, but Hash remains faster, always :frowning:

All in all, this is super-cool!

It’s a pity it isn’t quite there yet, but we still have a lot of place
for optimization :slight_smile:

···

On Wed, Aug 28, 2002 at 02:52:26AM +0900, Joseph McDonald wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

‘Ooohh… “FreeBSD is faster over loopback, when compared to Linux
over the wire”. Film at 11.’
– Linus Torvalds

Mauricio Fernández batsman.geo@yahoo.com wrote in message news:20020828023514.GB19917@kodos

diff -rwdu ruby/configure.in ruby-judy/configure.in
— ruby/configure.in Sun Aug 11 23:43:24 2002
+++ ruby-judy/configure.in Tue Aug 27 17:46:42 2002
@@ -281,6 +281,7 @@
AC_CHECK_LIB(crypt, crypt)
AC_CHECK_LIB(dl, dlopen) # Dynamic linking for SunOS/Solaris and SYSV
AC_CHECK_LIB(dld, shl_load) # Dynamic linking for HP-UX
+AC_CHECK_LIB(Judy, JudyLIns) # Judy

…deleted

Mauricio:

I am the author of Judy. I just had a chance to look over this file.
Overall I am impressed. I just heard of Ruby for the first time a
week or two ago. It also looks like there has been discussion of
how this performs VS. the normal hashing routines in Ruby. This
is good, I want know how the performance compares. I am commited
to making Judy the fastest and most memory efficient method on the
Planet.

The first thing I noticed that could affect the performance of the
Judy version is the lack of use of the Judy Macros --JLG() JLN(),
JLI() etc. It is obvious you read the JudyHashing application
note, but missed one of the very important parts. The use of
JLG() and JLN()are very important for the best performance at
low populations of the Judy Array. As a first cut change the
lines:

            from:
PValue = JudyLNext(table->PJLArray[i], &Index, PJE0);
            to:
    JLN(PValue, table->PJLArray[i], Index);

            from:

PValue = JudyLGet(HASHARRAY(table,hash_val), bin_pos,PJE0);
to:
JLG(PValue,HASHARRAY(table,hash_val), bin_pos);

            from:

ptr = (st_table_entry **) JudyLGet(HASHARRAY(table, hash_val),
HASHINDEX(hash_val), PJE0);
to:
JLG(ptr, HASHARRAY(table, hash_val), HASHINDEX(hash_val);

The use of JLG and JLN should improve the performance of the retrieve
to be very competitive with the hash method used here. Please let me
know how it goes. I also have some other concerns, but I will talk
about that later. And while your at it, change all the JudyL* routines
to their macro counterparts. I have designed the macros to give all
the flexibility in handling errors I think you need.

I am new to the open source community. I am about to change the phrase
“faster than the speed of light” to “faster than the speed of the internet”.

Doug Baskins doug@sourcejudy.com

“Mauricio Fernández” wrote in,

I later found some bugs in my plug-in replacement for Hash. I have
another patch which apparently fixes everything. It passes the same
tests JudyHash does and some others, as st_* is used for everything and
Ruby seems to run fine now.

Hi,

I am very curious (and running windows so no judy yet) how this
replacement effect pure method lookup time? Keeping it
simple, what are the Fibonacci and Ackerman benchmarks?

···

def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end

benchmark …

fib(30)

def ack(m,n)
return n+1 if m == 0
return ack(m-1,1) if n == 0
ack(m-1,ack(m,n-1))
end

and

ack(3,7) # or ack(3,8)


/Christoph

Bad news everyone…
This, and insertion being faster with Judy*, is an illusion :frowning:
Read below.

It’s a pity it isn’t quite there yet, but we still have a lot of place
for optimization :slight_smile:

sad to hear, it seemed we were looking a significant speed increase for
ruby in this area. but perhaps optimizations will still do it.

recently someone pointed out to me that Arrays were an order of
magnitude slower then Hashes. that suprised me. has any of this work
sparked notions on how to bring Array more upto speed with Hash? That,
at least, would be a nice boost. just some thoughts.

~transami

Mauricio Fernández batsman.geo@yahoo.com wrote in message news:20020828023514.GB19917@kodos

diff -rwdu ruby/configure.in ruby-judy/configure.in
— ruby/configure.in Sun Aug 11 23:43:24 2002
+++ ruby-judy/configure.in Tue Aug 27 17:46:42 2002
@@ -281,6 +281,7 @@
AC_CHECK_LIB(crypt, crypt)
AC_CHECK_LIB(dl, dlopen) # Dynamic linking for SunOS/Solaris and SYSV
AC_CHECK_LIB(dld, shl_load) # Dynamic linking for HP-UX
+AC_CHECK_LIB(Judy, JudyLIns) # Judy

…deleted

Mauricio:

I am the author of Judy. I just had a chance to look over this file.
Overall I am impressed.

Actually this patch is derived from Lyle Johnson’s original work on
RJudy; you should direct your kudos to him :slight_smile:
I hacked it quite fast (around 6 hours from my downloading RJudy 0.1 to
the first release of my patch), and at the beginning I was just trying
to get Judy to work inside the internal hash of Ruby.

I just heard of Ruby for the first time a week or two ago. It also
looks like there has been discussion of how this performs VS. the
normal hashing routines in Ruby. This is good, I want know how the
performance compares. I am commited to making Judy the fastest and
most memory efficient method on the Planet.

Cool! We’re still pretty much amazed at Judy being faster than regular
hashes :slight_smile:

The first thing I noticed that could affect the performance of the
Judy version is the lack of use of the Judy Macros --JLG() JLN(),
JLI() etc. It is obvious you read the JudyHashing application
note, but missed one of the very important parts. The use of
JLG() and JLN()are very important for the best performance at
low populations of the Judy Array.
[deleted]

As I said before, the first version of the patch (the one I posted to
this list) was a mere copy of RJudy’s, modified to work as Ruby’s
internal hash. I inherited his lack of use of the Judy macros, for I
knew I could change them later and I was mostly doing cut & paste
anyway…

However, once I caught some bugs which would make Ruby segfault from
time to time, I started more serious modifications, which made it faster
than RJudy. I also used macros in selected places (the main lookup
function and a couple more). I experienced little change in the overall
speed of Ruby and specifically of the operations on hashes, but it was
slightly better. After reading your post, I replaced all the function
calls by macros and… it’s slower now :frowning: Must be some code cache
issue…

I do lack a serious benchmarking procedure, and I’m mostly running
Lyle’s example with minor corrections on different data sets, and
Christoph’s Ackerman and Fibonacci functions. I try to compare fairly
the execution time, and consider the page faults involved too, as I
suspect they are important :slight_smile:

The use of JLG and JLN should improve the performance of the retrieve
to be very competitive with the hash method used here. Please let me
know how it goes.

At the moment, with my not-so-reliable benchmarks, I’ve observed the
following:

  • standard hash is faster at low populations; this is extremely
    important since that will affect method and attribute lookups
  • as size grows, insertion in the Judy-based hash is a little faster,
    but lookup remains slightly slower
  • everything’s faster w/ HASHSIZE=1 than with HASHSIZE=256.
    Plus st_foreach is a serious speed hit when HASHSIZE grows, as it must
    iterate over all the Judy arrays…

I also have some other concerns, but I will talk
about that later. And while your at it, change all the JudyL* routines
to their macro counterparts. I have designed the macros to give all
the flexibility in handling errors I think you need.

I don’t care much about error handling (as long as I can benchmark it,
it’s fine by me for the moment). I do more about macros being slower
than function calls, for all my code is useless if it can’t run faster
than the hashing currently used…

I am new to the open source community. I am about to change the phrase
“faster than the speed of light” to “faster than the speed of the internet”.

We’re a massively parallel system now, a CPU with an enormous pipeline,
each part working on its own timezone :slight_smile: It takes some time for the
first result to come, but thereafter…

···

On Sun, Sep 01, 2002 at 01:17:14AM +0900, Doug Baskins wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

We come to bury DOS, not to praise it.
– Paul Vojta, vojta@math.berkeley.edu

It isn’t faster yet wrt method lookup, but keep in mind I’ve written it
in ~3-4 hours and have made no attempt whatsoever to optimize it…

I can think of some ways to speed up method lookup using Judy:

  • use JudySL for this, as ASCIIZ is good enough
    → the hashing function for methods becomes void
  • use tricks to discriminate between Judy arrays and st_table_entry
    structs, like the Fixnum / pointer business.
···

On Wed, Aug 28, 2002 at 09:04:01PM +0900, Christoph wrote:

“Mauricio Fernández” wrote in,

I later found some bugs in my plug-in replacement for Hash. I have
another patch which apparently fixes everything. It passes the same
tests JudyHash does and some others, as st_* is used for everything and
Ruby seems to run fine now.

Hi,

I am very curious (and running windows so no judy yet) how this
replacement effect pure method lookup time? Keeping it
simple, what are the Fibonacci and Ackerman benchmarks?


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

…and scantily clad females, of course. Who cares if it’s below zero
outside.
– Linus Torvalds

[…]

Some more thoughts about this:

  • fib and ack aren’t really testing method lookup, as is it cached the
    first time
  • Judy is better as the data structure size grows… method tables are
    small, so it will hardly make any difference against the regular hash
  • in fact these tables are so small the plain hash table wouldn’t have
    to be re-allocated and rehashed to grow any further…
  • JudySL-style hashes work only when (key1 == key2) <=> bit by bit
    identical

This particular thing (hash indexed by ID, which is a 32 bit (64 where
supported?) integer) should be done with JudyL. If there’s any place
using Judy would immediately pay off, it’s here. I might work on this
later.

···

On Wed, Aug 28, 2002 at 09:04:01PM +0900, Christoph wrote:

“Mauricio Fernández” wrote in,

I later found some bugs in my plug-in replacement for Hash. I have
another patch which apparently fixes everything. It passes the same
tests JudyHash does and some others, as st_* is used for everything and
Ruby seems to run fine now.

Hi,

I am very curious (and running windows so no judy yet) how this
replacement effect pure method lookup time? Keeping it
simple, what are the Fibonacci and Ackerman benchmarks?

_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Being overloaded is the sign of a true Debian maintainer.
– JHM on #Debian

In this discussion you should probably also consider the “Ideal Hash
Trees” by Phil Bagwell. He has indicated to me in personal communication
that they should have even better properties than Judy-style arrays/hashes
for some applications. I don’t have the IHT papers fresh in memory but
someone (interested in Ruby internals or doing a fast hash extension)
might wanna take a look:

http://lampwww.epfl.ch/publications/papers/idealhashtrees.pdf

Might be some long-term optimization opportunities in there…

Regards,

Robert Feldt

···

On Thu, 29 Aug 2002, Tom Sawyer wrote:

Bad news everyone…
This, and insertion being faster with Judy*, is an illusion :frowning:
Read below.

It’s a pity it isn’t quite there yet, but we still have a lot of place
for optimization :slight_smile:

sad to hear, it seemed we were looking a significant speed increase for
ruby in this area. but perhaps optimizations will still do it.

recently someone pointed out to me that Arrays were an order of
magnitude slower then Hashes. that suprised me. has any of this work
sparked notions on how to bring Array more upto speed with Hash? That,
at least, would be a nice boost. just some thoughts.

Doug & Mauricio (and anyone else reading this thread),

Just wanted to let you know that I’m not ignoring all of the discussion
about the Ruby/Judy interfacing.

Because I wanted to open up the development a bit, and make the CVS
repository available, etc. I applied for a new project registration on
Savannah (http://savannah.gnu.org) almost a week ago; but that request has
been ignored. I also filed a follow-up support request, which has been
ignored. Apparently the entire Savannah support staff has been hit by a bus.
I’ll put in a request with SourceForge instead (who have always been very
prompt, by the way) and try and get an update out soon.

Thanks,

Lyle