I am using blacklist of SquidGuard for content filtering.
SquidGuard uses Berkeley DB to store: Domains, urls and regex.
now i am using redis + mysql DB redis for memory cache and mysql to store the data.
SquidGuard search is really fast and faster then mysql many times and i want to try to store the Domains in Berkeley DB file as persistent storage.
I have domains blacklist file which contains on each line one domain that I want to store in the DB file.
I have tried to read about Berkeley DB how it works but I dont really understand yet how they use the DB to store domains.
the original file is 17+ MB and i want to benefit from the DB for fast lookup.
in mysql the size of the DB + INDEX is about 100MB.
a Berkeley DB of the same data the was made by SquidGuard is about 50-60MB size.
I want to benchmark the Berkeley DB and mysql or other DB.
so:
1. basic suggestions on how to organize TLV domains DB?
2. how do i organize the domains in a "Ordered key-value" DB such as Berkeley?
3. ways to benchmark key lookup in DB?
4. other DB you can recommend for the task?
The API i want to use is "add(domain)" "exist(domain)" "remove(domain)".
I am looking for code snippets and examples on usage of Berkeley DB in ruby using the ruby-bdb(0.2.6.5).
I have seen the example in the github repo but some more examples for real-world usage is what i am looking for.
Thanks,
Eliezer
ยทยทยท
--
Eliezer Croitoru
https://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il