After a posting I did yesterday, I decided to take a closer look
at st_* library code. Please correct me if I am wrong in the
following discussion:
The st_* library is a collection of "string" to "word" mapping
routines. BUT, only the POINTER to the string is stored in the
internal st_* structures. This implies that the comparison of the
pointer is enough to determine if the string is a match. Now we are
in JudyL's territory (pointer to pointer mapping) instead of JudySL
(string to pointer mapping). JudyL has been tuned!!! I wrote the
st_* routines using JudyL and wrote a benchmark to compare with the
ruby-1.6.7 version of "st.c". Since, JudyL does not have
collisions, there is no need for a collision structure.
Then I wrote a benchmark simulating st_* with JudyL inside:
<www.sourcejudy.com/benchmarks/st_compare.tar.gz>.
The results:
This is the output of 2 versions of st_compare when run on a 933Mhz P3,
one using “stock” st.c in ruby, the other using the simulation with
JudyL inside. “domnames.txt” is a text file of most/all of the
internet domain names.
wc domnames.txt
28826508 28826508 488227095 domnames.txt
That’s right 28.8 million names, let’s see if it is current:
grep -i sourcejudy domnames.txt
Yep, not more than 2 months old.
Hash_st domnames.txt
lines avg_linelen RAMused/line insert/line lookup/line lookup_random
28826508 16.9 bytes 24.0 bytes 1.674 uS 0.570 uS 1.056 uS
Judy_st domnames.txt
lines avg_linelen RAMused/line insert/line lookup/line lookup_random
28826508 16.9 bytes 6.8 bytes 0.472 uS 0.154 uS 0.706 uS
Notice the Judy_st version used 1/2Gb less memory.
The performance improvement reminds me of the snake oil salesman.
There are still a lot of performance opportunities if st_* were retired.
My take on the st_* routines is they are based on a machine designed
a dozen years ago. They are not tuned for a modern machine.
Enough fun, back to getting Judy to work with ‘autoconf’ et.al.
Doug Baskins doug@sourcejudy.com