[OT] spam filter Was: Re: Urgent Assistance

I believe the ruby-talk group is already
filtered by SpamAssassin.

Unlikely, considering all the spams that get through
and all the messages saying that the list is
unfiltered.

I would suggest a Bayesian filter.

SpamAssassin’s current version uses a Bayesian
filter.

Shannon

-Billy

···

From: “Shannon Fang” xrfang@hotmail.com

No, it certainly is filtered. Look at the headers of the spams which get
through: for example the most recent one had

X-Spam-Status: No, hits=3.2 required=5.0 tests=INVALID_DATE_NO_TZ,DEAR_SOMEBODY,SUPERLONG_LINE,RCVD_IN_ORBZ version=2.20
X-Spam-Level: ***

This explains why so little spam hits this list, given it is gatewayed to
Usenet.

If you think the threshold of 5.0 is too low, you can easily filter yourself
on the headers provided (e.g. on “X-Spam-Level: **” for a threshold of 2 or
more)

As I’ve said before, the signal to spam level on this list is pretty good,
and I would not like to see messages thrown out because of a trigger-happy
spam filter.

Regards,

Brian.

···

On Wed, Apr 16, 2003 at 03:43:11AM +0900, wtanksleyjr@cox.net wrote:

From: “Shannon Fang” xrfang@hotmail.com

I believe the ruby-talk group is already
filtered by SpamAssassin.

Unlikely, considering all the spams that get through
and all the messages saying that the list is
unfiltered.

The mail-news gateway is filtered: I run SpamAssassin on messages going
to both directions. However, the mailing list itself is open, so anyone
can post to ruby-talk directly.

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway). I was on the point of reducing the threshold a bit
to see if it made things better. Does anyone have any experience to
share here?

Cheers

Dave

···

On Tuesday, April 15, 2003, at 03:41 PM, Brian Candler wrote:

No, it certainly is filtered. Look at the headers of the spams which
get
through: for example the most recent one had

X-Spam-Status: No, hits=3.2 required=5.0
tests=INVALID_DATE_NO_TZ,DEAR_SOMEBODY,SUPERLONG_LINE,RCVD_IN_ORBZ
version=2.20
X-Spam-Level: ***

The mail-news gateway is filtered: I run SpamAssassin on messages going
to both directions. However, the mailing list itself is open, so anyone
can post to ruby-talk directly.

This explains a lot. I have never seen a mailing list with as much spam
as ruby-talk, and this might be the reason.

It is typical for a mailing list to be members-only. This doesn’t exclude
anyone because everyone is free to become a member by going to the
listinfo page. I would expect this to stop the bulk of the spam.

Any thoughts?

···


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

X-Spam-Status: No, hits=3.2 required=5.0
tests=INVALID_DATE_NO_TZ,DEAR_SOMEBODY,SUPERLONG_LINE,RCVD_IN_ORBZ
version=2.20
X-Spam-Level: ***

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway).

I’m surprised that spamassassin found “Dear Somebody” as the only suspicious
content in that message [ruby-talk:69424] when it is a boilerplate
money-transfer scam. Perhaps a newer version would have a better ruleset?

I notice it has a Bayesian filter now, which could be worth training up with
a bunch of normal ruby-talk postings since many words used on this list are
very unlikely to appear in spams.

As for getting wise - personally I think the absolute volume of spam is just
increasing exponentially…

Cheers,

Brian.

···

On Wed, Apr 16, 2003 at 05:57:34AM +0900, Dave Thomas wrote:

I’ve found the same thing recently; that SpamAssassin’s effectiveness has
reduced; I tried lowering the threshold, but that increased the number of
false positives to an unacceptable level. I’m still deciding what to do next;
the Ruby-based Bayesian filter bsproc seems to be quite effective given my
initial tests, but I’m still looking at the problem. Unfortunately my spam
corpus is fairly small (since I tend not to keep my spam) and so I’ve had the
somewhat odd experience of searching Google for a collection of spam I can
download.

Tim Bates

···

On Wed, 16 Apr 2003 6:27 am, Dave Thomas wrote:

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway). I was on the point of reducing the threshold a bit
to see if it made things better. Does anyone have any experience to
share here?


tim@bates.id.au

In article E0ECBA30-6F84-11D7-A131-000A95676A62@pragprog.com,

···

Dave Thomas dave@pragprog.com wrote:

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway). I was on the point of reducing the threshold a bit
to see if it made things better. Does anyone have any experience to
share here?

Are you using the Bayesian filter from SA? Properly trained, a Bayesian
filter (like bmf[1] or bogofilter[2]) are more effective than regexp based
modules like SA. Less false positive even though some spam do get through.

I’ve yet to see a false positive with bogofilter in 8 months.

Ollivier ROBERT -=- Eurocontrol EEC/ITM -=- roberto@eurocontrol.fr
Usenet Canal Historique FreeBSD: The Power to Serve!

In article E0ECBA30-6F84-11D7-A131-000A95676A62@pragprog.com,

No, it certainly is filtered. Look at the headers of the spams which
get
through: for example the most recent one had

X-Spam-Status: No, hits=3.2 required=5.0
tests=INVALID_DATE_NO_TZ,DEAR_SOMEBODY,SUPERLONG_LINE,RCVD_IN_ORBZ
version=2.20
X-Spam-Level: ***

The mail-news gateway is filtered: I run SpamAssassin on messages going
to both directions. However, the mailing list itself is open, so anyone
can post to ruby-talk directly.

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway). I was on the point of reducing the threshold a bit
to see if it made things better. Does anyone have any experience to
share here?

_ You might want to look at something statistical like
ifile. It’s a general purpose text-file sorter that does
a very good job when choosing between spam/non-spam for
mailing lists. Here’s the URL

http://www.nongnu.org/ifile/

and just to keep some Ruby content in the message. Here’s
my ruby module for dealing with ifile. I’ve heard rumors
that SpamAssassin can also now do this kind of statistical
matching.

_ Booker C. Bense

···

Dave Thomas dave@pragprog.com wrote:

On Tuesday, April 15, 2003, at 03:41 PM, Brian Candler wrote:

$Id: ifile.rb,v 1.2 2003/01/22 19:07:50 bbense Exp $

Ifile, a class for interacting with ifile program.

Get ifile at http://www.nongnu.org/ifile/

Booker C. Bense bbense@slac.stanford.edu

require ‘open3’

module Ifile

This is the wrong name, but I can’t think of anything better.

class Process

Tell me where ifile lives.

def initialize(path=“/var/local/bin/ifile”,args=“–verbosity=0”)
if FileTest.executable?(path) then
@ifile = path
@args = args
else
raise ArgumentError
end
end

Given a message, query folders

def query(msg)
results = Array.new
output = self.run_ifile(msg,"–query ")
i = 0
output.each do |line|
# Format of output is folder score
folder , score = line.split
if ( folder && score ) then
tmp = Hash.new
tmp[‘folder’] = folder
tmp[‘score’] = score.to_f
tmp[‘position’] = i
results << tmp
i = i + 1
end
end
return results
end

Add a message to a folder

def add(msg,folder)
output = self.run_ifile(msg,“–insert=#{folder}”)
end

Delete a message from a folder

def delete(msg,folder)
output = self.run_ifile(msg,“–delete=#{folder}”)
end

Refile

def refile(msg,oldfolder,newfolder)
self.delete(msg,oldfolder)
self.add(msg,newfolder)
end

internal methods

def run_ifile(msg,args)
stdin, stdout, stderr = Open3.popen3(“#{@ifile} #{@args} #{args}”)
#write msg to ifile
msg.each { |line| stdin.puts line }
stdin.close
#Read output
output = stdout.readlines
stdout.close
stderr.close
return output
end

end

end # module Ifile

#Testing
if ( FILE == $0 ) then

ifile = Ifile::Process.new(“/var/local/bin/ifile”,“-v 0 -b ./idata.test”)

test = File.open(“./test.msg”)

msg = test.readlines

results = ifile.query(msg)
p results

ifile.add(msg,“test”)

results = ifile.query(msg)
p results

ifile.delete(msg,“test”)

results = ifile.query(msg)
p results

ifile.add(msg,“wrong”)

results = ifile.query(msg)
p results

ifile.refile(msg,“wrong”,“test”)

results = ifile.query(msg)
p results

ifile.delete(msg,“test”)

end

X-Spam-Status: No, hits=3.2 required=5.0
tests=INVALID_DATE_NO_TZ,DEAR_SOMEBODY,SUPERLONG_LINE,RCVD_IN_ORBZ
version=2.20
X-Spam-Level: ***

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway).

I’m surprised that spamassassin found “Dear Somebody” as the only
suspicious
content in that message [ruby-talk:69424] when it is a boilerplate
money-transfer scam. Perhaps a newer version would have a better
ruleset?

The gateway uses SA 2.5. I’m not sure who posted a 2.2 header.

As for getting wise - personally I think the absolute volume of spam
is just
increasing exponentially…

Perhaps, but SA used to catch a pretty high percentage of SPAM I
received: that percentage seems to have dropped off.

Cheers

Dave

···

On Tuesday, April 15, 2003, at 04:17 PM, Brian Candler wrote:

On Wed, Apr 16, 2003 at 05:57:34AM +0900, Dave Thomas wrote:

The mail-news gateway is filtered: I run SpamAssassin on messages going
to both directions. However, the mailing list itself is open, so anyone
can post to ruby-talk directly.

In addition, spammers seem to be getting wise to SpamAssassin: I’m
seeing more and more e-mail get through (both personally and in the
mail-news gateway). I was on the point of reducing the threshold a bit
to see if it made things better. Does anyone have any experience to
share here?

I personally run with 3.5 as the threshold, and have increased the point score
of a couple of rules.

I use procmail to ensure my mailing lists get through, and whitelist my domain
so stuff from the corporation isn’t deleted.

X-Spam-Status: No, hits=3.2 required=5.0
tests=INVALID_DATE_NO_TZ,DEAR_SOMEBODY,SUPERLONG_LINE,RCVD_IN_ORBZ
version=2.20 X-Spam-Level: ***

In addition, spammers seem to be getting wise to SpamAssassin: I’m seeing
more and more e-mail get through (both personally and in the mail-news
gateway).

I’m surprised that spamassassin found “Dear Somebody” as the only suspicious
content in that message [ruby-talk:69424] when it is a boilerplate
money-transfer scam. Perhaps a newer version would have a better ruleset?

The gateway uses SA 2.5. I’m not sure who posted a 2.2 header.

As for getting wise - personally I think the absolute volume of spam is just
increasing exponentially…

Perhaps, but SA used to catch a pretty high percentage of SPAM I received:
that percentage seems to have dropped off.

I have seen this also. I think it actually started when I upgraded spamassasin
versions from 2.2 to 2.31. Probably a coincidence and completely without hard
numbers to back it up.

I have not however turned on the Vipul’s razor functionality (tried briefly but
couldn’t get it to work), pyzor, or any of the blacklists.

Have you tried this for the list?

I think members only posts will probably help a lot though.

···

On Apr 16, Dave Thomas wrote:
On Apr 16, Dave Thomas wrote:

On Tuesday, April 15, 2003, at 04:17 PM, Brian Candler wrote:

On Wed, Apr 16, 2003 at 05:57:34AM +0900, Dave Thomas wrote:


Brett Williams