Hi Botp,
Matt Mower [mailto:matt.mower@gmail.com]
#I've recently released a Ruby port "Bishop" of the "Reverend"
#bayesian classifier written in Python. Bishop-0.3.0 is
#available as a Gem and from RubyForge
#
# http://rubyforge.org/projects/bishop/
hmmm, another cool filter. very small, took me less than 5 seconds to
install remotely the gem.
btw, matt, how difficult or easy it it to port the bishop database to a db
like postgres? I am asking since i may be querying/archiving more than
10_000 entries...
This is an excellent question. I want to use the classifier within a
Rails based information aggregator I am writing to allow
classification of interesting/uninteresting information and perhaps
for automatic labelling.
The problem I have is that the classifier will need to be available to
process each request classifying an item and each request that sorts
items, i.e. quite often. This probably means initializing it once and
storing it in a session variable. Since there is no concept of a
session expiry callback (Rails is not an app server). The question is
"How do I checkpoint the classifier as it is trained?"
At the moment I can serialize it to YAML but that takes a little time
and will get slower as the training set increases. Doing the YAML
conversion and a SQL update on each request is prohibitive.
I've been considering whether to have the classifier exist in a
separate thread|process and allow it to checkpoint itself
automatically at intervals independent of the users session behaviour.
Another option was to convert the code so that everything (or nearly
everything) operated directly out of the database.
Representing the pools and training data via SQL would be simple
enough since it's just (word,count) tuples. Basing the code on a SQL
variant might be quite attractive. The issue would be making it
portable.
Since I'm using Rails anyway I could certainly attempt an ActiveRecord
based variant which should satisy the Postgres requirement also.
Could be an interesting experiment. What do you think?
M
···
On 4/15/05, "Peña, Botp" <botp@delmonte-phil.com> wrote:
--
Matt Mower :: http://matt.blogs.it/