[ANN] Ferret 0.1.0 (Port of Java Lucene) released

Hi Folks,

I know there have been at least a few people looking for something like this
on the mailing list, so please check it out. It's a port of a Java project
so I'd particularly like to hear how I can make it more Ruby like. Enjoy!

Dave Balmain

== Description

Ferret is a full port of the Java Lucene searching and indexing library.
It's available as a gem so try it out! To get started quickly read the quick
start at the project homepage;

http://ferret.davebalmain.com/trac/

== Quick (Very Simple) Example

require 'ferret'

include Ferret

docs = [
{ :title => "The Pragmatic Programmer",
:author => "Dave Thomas, Andy Hunt",
:tags => "Programming, Broken Windows, Boiled Frogs",
:published => "1999-10-13",
:content => "Yada yada yada ..."
},
{ :title => "Programming Ruby",
:author => "Dave Thomas, Chad Fowler, Andy Hunt",
:tags => "Ruby",
:published => "2004-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Agile Web Development with Rails",
:author => "Dave Thomas, David Heinemeier Hansson, Leon Breedt, Mike Clark,
Thomas Fuchs, Andreas Schwarz",
:tags => "Ruby, Rails, Web Development",
:published => "2005-07-13",
:content => "Yada yada yada ..."
},
{ :title => "Ruby, Developer's Guide",
:author => "Robert Feldt, Lyle Johnson, Michael Neumann",
:tags => "Ruby, Racc, GUI, FOX",
:published => "2002-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Lucene In Action",
:author => "Otis Gospodnetic, Erik Hatcher",
:tags => "Lucene, Java, Search, Indexing",
:published => "2004-12-01",
:content => "Yada yada yada ..."
}
]

index = Index::Index.new()

docs.each {|doc| index << doc }

puts index.size

puts "\nFind all documents on ruby:-"
index.search_each("tags:Ruby") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents on ruby published this year:-"
index.search_each("tags:ruby AND published: >= 2005") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents by the Pragmatic Programmers:-"
index.search_each('author:("dave Thomas" AND "Andy hunt")') do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

Whoa. Very cool. Thanks, David!

Jacob

···

On 10/20/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Hi Folks,

I know there have been at least a few people looking for something like this
on the mailing list, so please check it out. It's a port of a Java project
so I'd particularly like to hear how I can make it more Ruby like. Enjoy!

Dave Balmain

== Description

Ferret is a full port of the Java Lucene searching and indexing library.
It's available as a gem so try it out! To get started quickly read the quick
start at the project homepage;

http://ferret.davebalmain.com/trac/

== Quick (Very Simple) Example

require 'ferret'

include Ferret

docs = [
{ :title => "The Pragmatic Programmer",
:author => "Dave Thomas, Andy Hunt",
:tags => "Programming, Broken Windows, Boiled Frogs",
:published => "1999-10-13",
:content => "Yada yada yada ..."
},
{ :title => "Programming Ruby",
:author => "Dave Thomas, Chad Fowler, Andy Hunt",
:tags => "Ruby",
:published => "2004-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Agile Web Development with Rails",
:author => "Dave Thomas, David Heinemeier Hansson, Leon Breedt, Mike Clark,
Thomas Fuchs, Andreas Schwarz",
:tags => "Ruby, Rails, Web Development",
:published => "2005-07-13",
:content => "Yada yada yada ..."
},
{ :title => "Ruby, Developer's Guide",
:author => "Robert Feldt, Lyle Johnson, Michael Neumann",
:tags => "Ruby, Racc, GUI, FOX",
:published => "2002-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Lucene In Action",
:author => "Otis Gospodnetic, Erik Hatcher",
:tags => "Lucene, Java, Search, Indexing",
:published => "2004-12-01",
:content => "Yada yada yada ..."
}
]

index = Index::Index.new()

docs.each {|doc| index << doc }

puts index.size

puts "\nFind all documents on ruby:-"
index.search_each("tags:Ruby") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents on ruby published this year:-"
index.search_each("tags:ruby AND published: >= 2005") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents by the Pragmatic Programmers:-"
index.search_each('author:("dave Thomas" AND "Andy hunt")') do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

Uh oh, there goes another Ruby Quiz idea[1] torpedoed by a diligent
library author :wink:

I mean that with the utmost respect as this looks cool. Love the name too.

Ryan

P.S. Eh, we can do the Ruby Quiz either way :smiley:

1. http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/161261

···

On 10/20/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Ferret is a full port of the Java Lucene searching and indexing library.

Amazing,

This is a tremendous gift to the ruby on rails crowd. Lucene was
probably *the* library which was most missed in the ruby world.

Cheers for such an amazing port.

···

On 10/20/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Hi Folks,

I know there have been at least a few people looking for something like this
on the mailing list, so please check it out. It's a port of a Java project
so I'd particularly like to hear how I can make it more Ruby like. Enjoy!

Dave Balmain

--
Tobi
http://jadedpixel.com - modern e-commerce software
http://typo.leetsoft.com - Open source weblog engine
http://blog.leetsoft.com - Technical weblog

Superb! Thanks!

Sean

···

On 10/21/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Ferret is a full port of the Java Lucene searching and indexing library.

This is great news! I'm installing it now, and will have a go at it whenever the gem shows up :slight_smile:

The link to the tutorial embedded on your intro page is incorrect (the one in the TOC at the top right works though)

Cheers,
Bob

···

On Oct 20, 2005, at 10:36 PM, David Balmain wrote:

Hi Folks,

I know there have been at least a few people looking for something like this
on the mailing list, so please check it out. It's a port of a Java project
so I'd particularly like to hear how I can make it more Ruby like. Enjoy!

Dave Balmain

== Description

Ferret is a full port of the Java Lucene searching and indexing library.
It's available as a gem so try it out! To get started quickly read the quick
start at the project homepage;

http://ferret.davebalmain.com/trac/

== Quick (Very Simple) Example

require 'ferret'

include Ferret

docs = [
{ :title => "The Pragmatic Programmer",
:author => "Dave Thomas, Andy Hunt",
:tags => "Programming, Broken Windows, Boiled Frogs",
:published => "1999-10-13",
:content => "Yada yada yada ..."
},
{ :title => "Programming Ruby",
:author => "Dave Thomas, Chad Fowler, Andy Hunt",
:tags => "Ruby",
:published => "2004-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Agile Web Development with Rails",
:author => "Dave Thomas, David Heinemeier Hansson, Leon Breedt, Mike Clark,
Thomas Fuchs, Andreas Schwarz",
:tags => "Ruby, Rails, Web Development",
:published => "2005-07-13",
:content => "Yada yada yada ..."
},
{ :title => "Ruby, Developer's Guide",
:author => "Robert Feldt, Lyle Johnson, Michael Neumann",
:tags => "Ruby, Racc, GUI, FOX",
:published => "2002-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Lucene In Action",
:author => "Otis Gospodnetic, Erik Hatcher",
:tags => "Lucene, Java, Search, Indexing",
:published => "2004-12-01",
:content => "Yada yada yada ..."
}
]

index = Index::Index.new()

docs.each {|doc| index << doc }

puts index.size

puts "\nFind all documents on ruby:-"
index.search_each("tags:Ruby") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents on ruby published this year:-"
index.search_each("tags:ruby AND published: >= 2005") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents by the Pragmatic Programmers:-"
index.search_each('author:("dave Thomas" AND "Andy hunt")') do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;

Can't wait to try this!

thanks,
George.

···

On 10/21/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Hi Folks,

I know there have been at least a few people looking for something like this
on the mailing list, so please check it out. It's a port of a Java project
so I'd particularly like to hear how I can make it more Ruby like. Enjoy!

Dave Balmain

== Description

Ferret is a full port of the Java Lucene searching and indexing library.
It's available as a gem so try it out! To get started quickly read the quick
start at the project homepage;

http://ferret.davebalmain.com/trac/

== Quick (Very Simple) Example

require 'ferret'

include Ferret

docs = [
{ :title => "The Pragmatic Programmer",
:author => "Dave Thomas, Andy Hunt",
:tags => "Programming, Broken Windows, Boiled Frogs",
:published => "1999-10-13",
:content => "Yada yada yada ..."
},
{ :title => "Programming Ruby",
:author => "Dave Thomas, Chad Fowler, Andy Hunt",
:tags => "Ruby",
:published => "2004-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Agile Web Development with Rails",
:author => "Dave Thomas, David Heinemeier Hansson, Leon Breedt, Mike Clark,
Thomas Fuchs, Andreas Schwarz",
:tags => "Ruby, Rails, Web Development",
:published => "2005-07-13",
:content => "Yada yada yada ..."
},
{ :title => "Ruby, Developer's Guide",
:author => "Robert Feldt, Lyle Johnson, Michael Neumann",
:tags => "Ruby, Racc, GUI, FOX",
:published => "2002-10-06",
:content => "Yada yada yada ..."
},
{ :title => "Lucene In Action",
:author => "Otis Gospodnetic, Erik Hatcher",
:tags => "Lucene, Java, Search, Indexing",
:published => "2004-12-01",
:content => "Yada yada yada ..."
}
]

index = Index::Index.new()

docs.each {|doc| index << doc }

puts index.size

puts "\nFind all documents on ruby:-"
index.search_each("tags:Ruby") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents on ruby published this year:-"
index.search_each("tags:ruby AND published: >= 2005") do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

puts "\nFind all documents by the Pragmatic Programmers:-"
index.search_each('author:("dave Thomas" AND "Andy hunt")') do |doc, score|
puts "Document <#{index[doc]["title"]}> found with a score of %0.2f" % score
end

--
http://www.gmosx.com
http://www.navel.gr

Atm i'm using jruby to use lucene search, but this definitely sounds
great!!! Do you have any plans to include snowball stemmers? As my
documents are not english, i could use them :wink:

I just recall ruby's stemmer4r, which wraps the snowball stemmers. I
can't
wait to use this :wink:

David -

Have you looked into what it would take to allow all text to be UTF-8?
Would it take a complete overhaul or a somewhat-minor tweak?
If a tweak, what would you charge to make that lovely update? :slight_smile:

···

On 10/20/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Ferret is a full port of the Java Lucene searching and indexing library.
http://ferret.davebalmain.com/trac/

I have to say that the combination of Ruby, Eclipse, and Ferret have
just blown me away today. I've gone from knowing that Ruby existed (for
several years in fact), to thinking that I should check it out over the
last couple of months, to reading some of the doc yesterday, to having
a complete running installation of Ruby (through the One Click Windows
installer), with Eclipse layered on top, and Ferret installed and
operational - at least for this test example - in the space of 4 hours!

Really looking forward to the UTF-8 support too - some Google searching
on Ruby and UTF-8 led me to Ferret in the first place. Looks like an
absolutely great platform to do experiments in information retrieval
with - which is just what I need.

Well done and many thanks!

Peter

Thanks for pointing that out. I missed it. I do think it's a great idea for
a quiz though. It will be interesting to see what people come up with in say
50 lines as opposed to 10,000. And I'll be able to see who to hit up for
some help. :wink:

Dave

···

On 10/21/05, Ryan Leavengood <leavengood@gmail.com> wrote:

P.S. Eh, we can do the Ruby Quiz either way :smiley:

1. http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/161261

The link to the tutorial embedded on your intro page is incorrect
(the one in the TOC at the top right works though)

Bob, thanks for that, the link is fixed now. Unfortunately my
packaging system is a bit broken. I think a few files got left out so
please wait for version 0.1.1. It'll be ready in a couple of hours.

Regards,
Dave

I still think the quiz will be fun. Our goals are humble compared to a library like this.

James Edward Gray II

···

On Oct 20, 2005, at 11:46 PM, Ryan Leavengood wrote:

P.S. Eh, we can do the Ruby Quiz either way :smiley:

1. http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/161261

Atm i'm using jruby to use lucene search, but this definitely sounds
great!!! Do you have any plans to include snowball stemmers? As my
documents are not english, i could use them :wink:

Sean O'Halpin wrote:

···

On 10/21/05, David Balmain <dbalmain.ml@gmail.com> wrote:

Ferret is a full port of the Java Lucene searching and indexing library.

Superb! Thanks!

I'm probably wrong, but I thought we already had
a port of this? A thing called Rucene?

Hal
(goes off to Google)

Hi Norjee,
I've looked at the snowball parser and I don't think it would be too hard to
do a pure ruby version of this if enough people are interested. But that is
pretty low on my to do list so I hope stemmer4r will do for now. I also hope
that you won't be needing unicode support as that is one of the things that
is missing in Ferret. Speaking of which, anyone know of any good ruby
unicode tutorials?

Dave

···

On 10/21/05, Norjee <Norjee@gmail.com> wrote:

Atm i'm using jruby to use lucene search, but this definitely sounds
great!!! Do you have any plans to include snowball stemmers? As my
documents are not english, i could use them :wink:

I'm looking in to it now. I'll send you the bill. :wink:

···

On 10/22/05, Miles Keaton <mileskeaton@gmail.com> wrote:

Have you looked into what it would take to allow all text to be UTF-8?
Would it take a complete overhaul or a somewhat-minor tweak?
If a tweak, what would you charge to make that lovely update? :slight_smile:

Hi Peter,

Welcome aboard. With all the people who are coming to ruby because of rails,
it's great to hear my project has helped to convert someone. I hope you
enjoy it here.

Cheers,
Dave

···

On 10/28/05, peter.r.bailey@gmail.com <peter.r.bailey@gmail.com> wrote:

I have to say that the combination of Ruby, Eclipse, and Ferret have
just blown me away today. I've gone from knowing that Ruby existed (for
several years in fact), to thinking that I should check it out over the
last couple of months, to reading some of the doc yesterday, to having
a complete running installation of Ruby (through the One Click Windows
installer), with Eclipse layered on top, and Ferret installed and
operational - at least for this test example - in the space of 4 hours!

Really looking forward to the UTF-8 support too - some Google searching
on Ruby and UTF-8 led me to Ferret in the first place. Looks like an
absolutely great platform to do experiments in information retrieval
with - which is just what I need.

Well done and many thanks!

Peter

Question for those (soon to be) in the know:

How does this compare to (Estraier/Hyper Estraier/Ruby-Odeum/SimpleSearch/other 'IR' systems with Ruby bindings?) on (ease of learning/ease of use/ease of maintenance/speed/any other noteworthy attributes)? To put it simply, which one should I choose?* :slight_smile:

(Well, speed's pretty well covered on the home page, though I'm not sure how much faster Hyper Estraier is than Lucene, and not sure how much slower SimpleSearch is than Ferret. I only ask because it's the thing people in the position of questioneer are /supposed/ to do.)

Free feel to answer whatever part of that you (want/know), or just tell me to fork off... a thread.

*For those actually interested in answering that question, it'll be an intranet app that won't likely get a major amount of hits, but will likely have a major amount of data. Right now, I'm just looking to make a rough prototype in a week, but wouldn't mind picking a contender, if quickly-pickuppable.

(Devin/twifkak)
//

I now find ruby's stemmer4r, which wraps the snowball stemmers. I can't
wait to use this :wink: