Hello,

"Lyle Johnson" <lyle.johnson@gmail.com> writes:

I am pleased to announce the availability of the Ruby library 'clusterer'

which implements the basic K-Means and Hierarchical Clustering algorithms for

text data.

I've installed the gem but am not getting very good results with my

limited use. In particular, I tried the example you posted on your

blog:

Clusterer::Clustering.kmeans_clustering(["hello world","mea

culpa","goodbye world"])

but it appears to have placed all three strings in the same cluster;

the result was [[0, 1, 2]]. I get a similar result ([[1, 0, 2]]) if I

try the hierarchical clustering instead.

The examples were just to show how to use the algorithms.

Clustering can also be thought of as a problem where you are looking for

representative points for a given set of points, if you want to preserve all

the information you can have every point as a cluster, or if you want maximum

compression, then just have one cluster. So, there is a trade-off.

Here I choose the default number of clusters equal to Math.sqrt(no. of docs),

and with the example it reduces to integer 1, and hence one cluster.

If you want custom number of clusters, then use

Clusterer::Clustering.kmeans_clustering(["hello world","mea culpa","goodbye world"],2)

and also use it on a larger corpus to really evaluate the merit of the algorithms.

The algorithms may also need some additional customisation depending upon the

problem domain.

Cheers,

## ···

On 8/22/06, Surendra Singhi <efuzzyone@netscape.net> wrote:

--

Surendra Singhi

http://ssinghi.kreeti.com, http://www.kreeti.com

Read my blog at: http://cuttingtheredtape.blogspot.com/

,----

By all means marry; if you get a good wife, you'll be happy. If you

get a bad one, you'll become a philosopher.

-- Socrates

`----