Hi Devin,
I'm afraid I've only briefly looked at those other IR systems but I'll try
and answer your question as best I can. I think Ferret is currently pretty
easy to learn and use through the Index interface as described in my
original post. I don't think ease of use should turn you off. Once I've done
a bit more work on the documentation, I think it'll be a lot easier to find
your way around than some of the other ones. But it'll be significantly
slower than the C library backed search engines. I'm certainly not the type
of person to say speed isn't important, however, I think ferret should
easily handle the kind of website you are talking about.
Ferret should be a lot faster than SimpleSearch for large document sets.
Having said that, there is a ruby quiz coming up for which I intend to write
a quick and simple search engine that will easily outperform simple search
so if people are interested, I might make that a project too.
== As for the others, the main advantages of Ferret are;
* a more powerful extendable query language. You can do boolean, phrase,
range, fuzzy (for misspellings etc), wildcard, sloppy phrase (out of order
phrases) and more. Check out the Query Parser in the API for more info on
the query language.
http://ferret.davebalmain.com/api/classes/Ferret/QueryParser.html
* a more powerful document structure. I could be wrong about this so someone
please correct me if I am, but I think most of the other IR's just take a
string as a document. Ferrets documents can have multiple fields. Each field
can have a different analyzer (parses field into tokens). You can store
binary fields like images or compress your data. In fact, you could do away
with a database altogether and just use Ferret. (You can also store term
vectors if you want to compare document similarities, but that's getting
pretty technical)
* Ferret is pure ruby (at least it can be if you don't install the C
extension) so it'll run anywhere Ruby does.
* If you are patient, Ferret will one day match or beat the speed of those
other search engines. Hopefully by Christmas but it all depends how much
help I can get between now and then.
== And the main disadvantages;
* Ferret is still alpha and has not been put into production yet. Hopefully
that will change soon.
* Ferret is currently slower than the C backed IRs
Anyway, sorry for such a long email. It's really hard to describe all the
features available. In fact, there is a whole book on Lucene by Erik Hatcher
and Otis Gospodnetic which I highly recommend if you want to take full
advantage of all the features in Ferret. Most of the examples should
translate pretty easily into Ruby.
Please let me know if you have any more questions.
Regards,
Dave
···
On 10/21/05, Devin Mullins <twifkak@comcast.net> wrote:
Question for those (soon to be) in the know:
How does this compare to (Estraier/Hyper
Estraier/Ruby-Odeum/SimpleSearch/other 'IR' systems with Ruby
bindings?) on (ease of learning/ease of use/ease of
maintenance/speed/any other noteworthy attributes)? To put it simply,
which one should I choose?* 