I'm looking to find out whether anyone is doing latent semantic indexing
(LSI) in Ruby at any kind of web scale, and if so, what tools and techniques
you're using?
Just for context, I've been working on this problem for a few days now.
I've tried the Classifier gem via "gem install" and compiled from source
and at least two other forks. I've tried compiling various versions of the
GSL library, most of which would not allow the gsl gem to compile, and it
seems that in the combinations where I can actually get the full set of
libraries to install, I receive an error like the following:
/home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:316:in
`SV_decomp': Ruby/GSL error code 24, svd of MxN matrix, M<N, is not
implemented (file svd.c, line 61), the requested feature is not (yet)
implemented (GSL::ERROR::EUNIMPL)
from /home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:316:in
`build_reduced_matrix'
from /home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:128:in
`build_index'
from /home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:66:in
`add_item'
from lsi_test.rb:18:in `block in <main>'
from lsi_test.rb:18:in `each'
from lsi_test.rb:18:in `<main>'
This particular stack trace was when running with a fork of Classifier, but
the result is essentially the same with the original gem with the exception
of the line numbers, and it looks as though the error is unrelated to
Classifier but rather the gsl gem or the underlying GSL library.
Any help or shared experiences will be appreciated. Thanks in advance.
You'd be better off contacting the author. There is no guarantee that they read this list.
···
On May 26, 2011, at 11:32 , Chris Kottom wrote:
Just for context, I've been working on this problem for a few days now.
I've tried the Classifier gem via "gem install" and compiled from source
and at least two other forks. I've tried compiling various versions of the
GSL library, most of which would not allow the gsl gem to compile, and it
seems that in the combinations where I can actually get the full set of
libraries to install, I receive an error like the following:
/home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:316:in
`SV_decomp': Ruby/GSL error code 24, svd of MxN matrix, M<N, is not
implemented (file svd.c, line 61), the requested feature is not (yet)
implemented (GSL::ERROR::EUNIMPL)
from /home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:316:in
`build_reduced_matrix'
from /home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:128:in
`build_index'
from /home/ck1/.rvm/gems/ruby-1.9.2-p180@classifier_test/gems/kitop-classifier-1.4.4/lib/classifier/lsi.rb:66:in
`add_item'
from lsi_test.rb:18:in `block in <main>'
from lsi_test.rb:18:in `each'
from lsi_test.rb:18:in `<main>'
This particular stack trace was when running with a fork of Classifier, but
the result is essentially the same with the original gem with the exception
of the line numbers, and it looks as though the error is unrelated to
Classifier but rather the gsl gem or the underlying GSL library.
The author of Picky <http://florianhanke.com/picky/> presented it last night
at the Melbourne Ruby group. Not sure if it's interesting to you, but it looks
like a different kind of search engine to Sphinx, etc.
Clifford Heath.
···
On 05/27/11 04:32, Chris Kottom wrote:
I'm looking to find out whether anyone is doing latent semantic indexing
(LSI) in Ruby at any kind of web scale, and if so, what tools and techniques
you're using?
Starting about a week ago, ruby is crashing fairly often during rails development: rails server, console, and during spec runs. But it's not consistent.
I have tried the following, but did not help:
- remove all gems and re-bundle
- uninstall ruby 1.9.2-p180 and re-install
- use ruby 1.9.2-p136 with new gem re-bundle
At first I was startled because I have never seen ruby crash before on this machine. Now that the novelty has worn thin, it's becoming quite a distraction.
Since everything was working just a few days ago, I'm stumped as to what may be causing this. I need help tracking down the cause.
Thanks, Ryan. I will do this too, was just looking to see what the current
de facto standard method for this is. Digging a little deeper on both
GitHub and RubyForge, it seems that the gem has been pretty much dormant for
several years, so I'm looking to see whether people have moved on to another
fork or another lib. Will post any findings.
Thanks, Clifford, for the tip. It's not exactly what I need for this
particular part of the application, as I'm using the Classifier LSI feature
to index documents and detect similar records, but it might be worth
investigating as a replacement for Sphinx in other places in this app and
others I'm working on.
···
On Fri, May 27, 2011 at 3:22 AM, Clifford Heath <no.spam@please.net> wrote:
On 05/27/11 04:32, Chris Kottom wrote:
I'm looking to find out whether anyone is doing latent semantic indexing
(LSI) in Ruby at any kind of web scale, and if so, what tools and
techniques
you're using?
The author of Picky <http://florianhanke.com/picky/> presented it last
night
at the Melbourne Ruby group. Not sure if it's interesting to you, but it
looks
like a different kind of search engine to Sphinx, etc.
Please don't thread hijack. Start a new thread properly.
···
On May 26, 2011, at 20:16 , Karl Smith wrote:
Starting about a week ago, ruby is crashing fairly often during rails development: rails server, console, and during spec runs. But it's not consistent.
Well, what did change in your system environment in the last few days?
Looking at the crash log, and considering the crash happens after an
SQL statement: What database are you using, and what is its version?
And what's your Rails/ActiveRecord version?
If possible, build an application with the *minimum* set of external
libraries that still produces a crash.
···
On Fri, May 27, 2011 at 5:16 AM, Karl Smith <threadhead@gmail.com> wrote:
At first I was startled because I have never seen ruby crash before on this machine.
Now that the novelty has worn thin, it's becoming quite a distraction.
Since everything was working just a few days ago, I'm stumped as to what may be causing this. I need help tracking down the cause.
--
Phillip Gawlowski
A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
-- Leibnitz
So for what it's worth, the particular issue I was running into was not
caused by Classifier at all but rather by the test data I was using. The
application this is for is still under development, so the texts that I'm
indexing are multiple-paragraph blocks being generated using Faker::Lorem.
The problem here is that this library has a limited vocabulary of less than
200 words, and Classifier::LSI requires that the number of unique words
being indexed across all texts must be greater than or equal to the number
of text instances. (It seems like it was also filtering out some number of
words -- probably one- and two-character words which might be considered
stop words.) So as soon as the number of records indexed exceeded the
number of unique words, the underlying library (GNU GSL) propagated an
exception.
I've now tested this against a set of strings utilizing a richer vocabulary,
and even though indexing slows down exponentially with greater numbers of
records, it completes successfully. Hope this description helps someone
else out.
···
On Fri, May 27, 2011 at 10:42 AM, Chris Kottom <chris@chriskottom.com>wrote:
Thanks, Ryan. I will do this too, was just looking to see what the current
de facto standard method for this is. Digging a little deeper on both
GitHub and RubyForge, it seems that the gem has been pretty much dormant
for
several years, so I'm looking to see whether people have moved on to
another
fork or another lib. Will post any findings.
Thanks, Clifford, for the tip. It's not exactly what I need for this
particular part of the application, as I'm using the Classifier LSI feature
to index documents and detect similar records, but it might be worth
investigating as a replacement for Sphinx in other places in this app and
others I'm working on.
On Fri, May 27, 2011 at 3:22 AM, Clifford Heath <no.spam@please.net> > wrote:
> On 05/27/11 04:32, Chris Kottom wrote:
>
>> I'm looking to find out whether anyone is doing latent semantic indexing
>> (LSI) in Ruby at any kind of web scale, and if so, what tools and
>> techniques
>> you're using?
>>
>
> The author of Picky <http://florianhanke.com/picky/> presented it last
> night
> at the Melbourne Ruby group. Not sure if it's interesting to you, but it
> looks
> like a different kind of search engine to Sphinx, etc.
>
> Clifford Heath.
>
>
Which ships with ruby... Something is brokey with racc itself? I dunno...
···
On May 27, 2011, at 00:57 , Phillip Gawlowski wrote:
On Fri, May 27, 2011 at 5:16 AM, Karl Smith <threadhead@gmail.com> wrote:
At first I was startled because I have never seen ruby crash before on this machine.
Now that the novelty has worn thin, it's becoming quite a distraction.
Since everything was working just a few days ago, I'm stumped as to what may be causing this. I need help tracking down the cause.
Well, what did change in your system environment in the last few days?
Looking at the crash log, and considering the crash happens after an
SQL statement: What database are you using, and what is its version?
And what's your Rails/ActiveRecord version?
If possible, build an application with the *minimum* set of external
libraries that still produces a crash.
Since I am not doing anything unusual (using common gems and typical methods for ruby/gem installation), I would expect to see others report the same issue. I have deleted and re-installed 1.9.2-p180 several times, tried reverting to 1.9.2-p136, and erased and re-installed all gems. Still keeps on crashing.
Not 100% sure what has changed. I did update Postgres to 9.0.4 via brew, so the pg gem would have been compiled against the new version. But again, I would expect others who have done the same to report issues.
The crashing is common, but not consistent. For example, it took 4 times running 'rake -T' before it would finally work. But eventually it did work.
Because of it's inconsistency, could this is a threading or timing issue with the pg gem?
···
On May 27, 2011, at 12:57 AM, Phillip Gawlowski wrote:
On Fri, May 27, 2011 at 5:16 AM, Karl Smith <threadhead@gmail.com> wrote:
At first I was startled because I have never seen ruby crash before on this machine.
Now that the novelty has worn thin, it's becoming quite a distraction.
Since everything was working just a few days ago, I'm stumped as to what may be causing this. I need help tracking down the cause.
Well, what did change in your system environment in the last few days?
Looking at the crash log, and considering the crash happens after an
SQL statement: What database are you using, and what is its version?
And what's your Rails/ActiveRecord version?
If possible, build an application with the *minimum* set of external
libraries that still produces a crash.