Ruby Spam Corpus

Hi all,

90% of my spam comes from ruby-talk.
I have begun using SpamAssassin and I’m trying to teach it enough spam so
the Bayesian filter can kick in.

Does anyone have a corpus of spam I can use? Ideally it’d be a corpus of
spam routed through ruby-talk which is the one I’m trying to get rid of.

If anyone does, could you please send it to me so I can train my spam
filter?

Thanks.


Daniel Carrera | OpenPGP fingerprint:
Mathematics Dept. | 6643 8C8B 3522 66CB D16C D779 2FDD 7DAC 9AF7 7A88
UMD, College Park | http://www.math.umd.edu/~dcarrera/pgp.html

90% of my spam comes from ruby-talk.

Wow. I get almost 0 through here…

On Wed, 23 Jul 2003 07:42:27 +0900, Daniel Carrera
dcarrera@math.umd.edu wrote (more or less):

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

90% of my spam comes from ruby-talk.
I have begun using SpamAssassin and I’m trying to teach it enough spam so
the Bayesian filter can kick in.

Does anyone have a corpus of spam I can use? Ideally it’d be a corpus of
spam routed through ruby-talk which is the one I’m trying to get rid of.

I have a corpus of about 750 spams (HGH viagra, get rich quick, etc)
which I’ve been collecting for the day I bother to get an email client
that does bayesian filtering. Very few, if any, have arrived from the
ruby-talk list.

You’re welcome to a copy, but the commonest term the Bayesian filter
will identify from it all is my email address!

If you’re interested, contact me off-list - there are various
‘contact’ links at my web-site.

Cheers,
Euan
Gawnsoft: http://www.gawnsoft.co.sr
Symbian/Epoc wiki: http://html.dnsalias.net:1122
Smalltalk links (harvested from comp.lang.smalltalk) http://html.dnsalias.net/gawnsoft/smalltalk

tried it and settled on bogofilter now. SpamAssassin needed much
more training than I could provide it to kick in. It would also
require some tweaking because otherwise it classifies all email
forvarded form Usenet as forged headers, resulting in useless
scores. Not to mention SpamAssasin eats more resources than
mozilla and runs about equally fast.

So if you want just a simple bayesian filter that is quite usable
with very little training and setup try
http://bogofilter.sourceforge.net

Richard

···

On Wed, Jul 23, 2003 at 07:42:27AM +0900, Daniel Carrera wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

90% of my spam comes from ruby-talk.
I have begun using SpamAssassin and I’m trying to teach it enough spam so
the Bayesian filter can kick in.

Well, I get very little spam from other sources. So actually yeah, 90% of
my spam comes from ruby-talk.


Daniel Carrera | OpenPGP fingerprint:
Mathematics Dept. | 6643 8C8B 3522 66CB D16C D779 2FDD 7DAC 9AF7 7A88
UMD, College Park | http://www.math.umd.edu/~dcarrera/pgp.html

···

On Wed, Jul 23, 2003 at 10:27:13PM +0900, Michael Campbell wrote:

90% of my spam comes from ruby-talk.

Wow. I get almost 0 through here…

tried it and settled on bogofilter now. SpamAssassin needed much
more training than I could provide it to kick in. It would also
require some tweaking because otherwise it classifies all email
forvarded form Usenet as forged headers, resulting in useless
scores. Not to mention SpamAssasin eats more resources than
mozilla and runs about equally fast.

So if you want just a simple bayesian filter that is quite usable
with very little training and setup try
http://bogofilter.sourceforge.net

Try also POPfile if you’re using POP to get your mail. Works with an
arbitrary number of buckets, not just “spam”/“not spam”, though you can do
that too if you want.