I'm glad to see you're on the lookout for ideas, James. I haven't seen your quiz topic submission yet.
I have seen your posts with links to the site though, so I'll forgive you for not being the first.
James Edward Gray II
ยทยทยท
On Sep 28, 2004, at 7:39 AM, James Britt wrote:
If the spam is entered by a script, then the wiki code should be able to use some simple heuristics to block the most annoying crap.
For example, if the diff from the old page to the new page is greater than some percentage, or if the new page contains X number of links to the same site.
You should create a way to generate images with text
verification. This would eliminate spam.
I think it would slow them down but it wouldn't eliminate them completely.
If the spam is entered by a script, then the wiki code should be able to
use some simple heuristics to block the most annoying crap.
For example, if the diff from the old page to the new page is greater
than some percentage, or if the new page contains X number of links to
the same site.
X number of links to _any_ site should be good enough. Automatic spam
could then go fine-grained to get under the radar, at which time a
secondary heuristic is needed, or X becomes zero. Your statement
below provides ample justification for that.
Might this cause a problem for legit users once in a while? Sure. But
we have that now, with spam clean-up.
Gavin
ยทยทยท
On Tuesday, September 28, 2004, 10:39:59 PM, James wrote:
That's more or less the idea behind an "entropy" value that gets saved
on a page right now -- I haven't figured out exactly what I'm going to
do with it, but it offers interesting possibilities.
-austin
ยทยทยท
On Tue, 28 Sep 2004 21:39:59 +0900, James Britt <jamesunderbarb@neurogami.com> wrote:
If the spam is entered by a script, then the wiki code should be able to
use some simple heuristics to block the most annoying crap.
For example, if the diff from the old page to the new page is greater
than some percentage, or if the new page contains X number of links to
the same site.
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations
That works for Ruby developers' wikis, but not for the general case.
Although my current "clients" for Ruwiki are all developers, I intend
to aim it a bit wider.
-austin
ยทยทยท
On Tue, 28 Sep 2004 21:37:04 +0900, Dave Thomas <dave@pragprog.com> wrote:
How about displaying a trivial line of Ruby code and asking the user to
enter the value. Something like
To stop spammers, please enter the value of the following
1.+(2) = | |
Change the + to a - or * randomly, and pick random numbers between 1
and 9
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations
The only way to stop wiki spam is to have a dedicated admin. Creativity helps reduce the time burden, but it is a constant > endeavor.
A tarpit would be easier to implement than a captcha. In the usemod settings, you use NetAddr::IP to check if the env's Remote Addr is within a known spammer domain. If it is a spammer, set the pages database to a copy. Nightly / weekly / whatever, dump the latest pages directory on top of the tarpit.
I said domain. I meant subnet. You can just put a whole isp on probation and not allow changes from it to be propagated to the main database.
Cheers,
Patrick
adding a whole isp to a probation list can lead to full scale lockout. I have dozens of proxies from many ISPs. I like that idea though. hmmm spam trap....
Are you saying to set up a trigger for people who post to a certain page?
A tarpit would be easier to implement than a captcha. In the usemod
settings, you use NetAddr::IP to check if the env's Remote Addr is
within a known spammer domain. If it is a spammer, set the pages
database to a copy. Nightly / weekly / whatever, dump the latest pages
directory on top of the tarpit.
There goes one of my points for my presentation
The main resource in fighting spammers is time. You want to waste
their time, let them think that things are working.
I'm approaching it, again, from a slightly different perspective. My
goal is to make the page seem as if it were entirely a read-only
website to robots, and 403 if they are known bad crawlers. I don't yet
have IP banning, but I have robot exclusion.
Read-only to robots makes sense as a way of preventing accidental problems. I used to have a delete link on the wiki. All my pages kept getting deleted. I guessed that it was a robot gone amuck [1] . I also like the bit about recognizing bad crawlers. No harvesting for old fashioned spam is a good thing.
The thing about banning is that it is easy for the vandal to tell that they have been detected. I tried using Apache Deny directives to manage abuse, but sometimes that just encourages the vandal to switch computers. Plus the cost of a false positive is denial of service. After one particularly annoying episode, I realized that the vandal was trying to waste my time. So I setup the tarpit system to waste his, and haven't lost sleep since.
I still do alot of cleanup on my wikis, and I still use Deny directives. Nothing replaces an active administrator. The tarpit just gave me another lever to help me manage the problem.
Cheers,
Patrick
1. I didn't labor to much over it, I just deleted the Delete link.
ยทยทยท
On Tuesday, September 28, 2004, at 08:15 PM, Austin Ziegler wrote:
On Wed, 29 Sep 2004 08:14:42 +0900, Patrick May <patrick@hexane.org> > wrote:
Great, thanks! Now I've just got to find the time to insert it and
test it. Hopefully some time this afternoon I can steal a few
minutes..
Much appreciated, Patrick!
Chad
ยทยทยท
On Wed, 6 Oct 2004 11:53:55 +0900, Patrick May <patrick@hexane.org> wrote:
Hello,
On Tuesday, September 28, 2004, at 07:14 PM, Patrick May wrote:
> Hello,
>
> On Tuesday, September 28, 2004, at 12:39 AM, David Ross wrote:
>
>> You should create a way to generate images with text
>> verification. This would eliminate spam.
>
> The only way to stop wiki spam is to have a dedicated admin.
> Creativity helps reduce the time burden, but it is a constant > endeavor.
>
> A tarpit would be easier to implement than a captcha. In the usemod
> settings, you use NetAddr::IP to check if the env's Remote Addr is
> within a known spammer domain. If it is a spammer, set the pages
> database to a copy. Nightly / weekly / whatever, dump the latest
> pages directory on top of the tarpit.
I threw together tarpit logic for usemod:
# == Configuration ====================================
use NetAddr::IP;
use vars qw( $TarpitDir $VandalFile );
$DataDir = "/tmp/mywikidb"; # Main wiki directory
$TarpitDir = "/tmp/tarpitdb"; # tarpit dir
$VandalFile = "/Users/patsplat/Desktop/usemod10/vandals.txt";
open(SOURCE, "< $VandalFile")
or die "Couldn't open $VandalFile for reading: $!\n";
my $remote_addr = new NetAddr::IP $ENV{"REMOTE_ADDR"};
while(<SOURCE>) {
my $vandal_host = new NetAddr::IP $_;
if ( $remote_addr->within( $vandal_host ) ) {
$DataDir = $TarpitDir;
}
}
I was considering the potential of moderation. And I also recalled someone
else pointing out that spammers are interested in one thing: external links
--and they had suggested we just get rid of external links altogether. Both
are too much. But then it hit me: combine the two!
ย ย If a page edit adds an external link, then the page has to be
ย ย approved by moderator.
No. It goes further. You set up a trigger to recognize vandals by IP address. You push their changes to an alternate database. They can see the site, they can make changes, they can see their changes on the site.
No one except the other vandals sees their changes. And every night, everything the vandals do is washed away.
Cheers,
Patrick
ยทยทยท
On Tuesday, September 28, 2004, at 07:23 PM, David Ross wrote:
adding a whole isp to a probation list can lead to full scale lockout. I have dozens of proxies from many ISPs. I like that idea though. hmmm spam trap....
Are you saying to set up a trigger for people who post to a certain page?
> I'm approaching it, again, from a slightly different perspective. My
> goal is to make the page seem as if it were entirely a read-only
> website to robots, and 403 if they are known bad crawlers. I don't yet
> have IP banning, but I have robot exclusion.
Patrick May:
Read-only to robots makes sense as a way of preventing accidental
problems. I used to have a delete link on the wiki. All my pages kept
getting deleted. I guessed that it was a robot gone amuck [1] . I
also like the bit about recognizing bad crawlers. No harvesting for
old fashioned spam is a good thing.
The thing about banning is that it is easy for the vandal to tell that
they have been detected. I tried using Apache Deny directives to
manage abuse, but sometimes that just encourages the vandal to switch
computers. Plus the cost of a false positive is denial of service.
After one particularly annoying episode, I realized that the vandal was
trying to waste my time. So I setup the tarpit system to waste his,
and haven't lost sleep since.
I still do alot of cleanup on my wikis, and I still use Deny
directives. Nothing replaces an active administrator. The tarpit just
gave me another lever to help me manage the problem.
As of right now, a tarpit would actually be a little too difficult to
implement in Ruwiki. It's much easier to present the wiki as if it
were a CMS or a read-only website.
-austin
ยทยทยท
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations
Here is one step of which many could be applied.
Mr. Britt you said in a message a while ago that the IP address 220.163.37.233 attacked Rubygarden. Here is the ultimate solution to step a good percentage of the spammers. I really didnt think about it at fist until I was setting up my RBL lists on servers..
Use this site to check for address. Make sure you send to the admins of the RBL servers that you are using the servers or you could get blacklisted from access.
This is *the* solution
RBLs are not only for mail.
I use a very big list of RBLs all the time. Remember never to use dul.dnsbl.sorbs.net
I was considering the potential of moderation. And I also
recalled someone
else pointing out that spammers are interested in one thing:
external links
--and they had suggested we just get rid of external links
altogether. Both
are too much. But then it hit me: combine the two!
If a page edit adds an external link, then the page has to be
approved by moderator.
That's a very good idea!
The spammers typical add a hundred or so external links to a page. So, requiring approval for more than, say, two external links on a page would ease the burden on legitimate users, while limiting spammers.
I was considering the potential of moderation. And I also recalled someone else pointing out that spammers are interested in one thing: external links --and they had suggested we just get rid of external links altogether. Both are too much. But then it hit me: combine the two!
If a page edit adds an external link, then the page has to be
approved by moderator.
T.
This would certainly throttle the spammers who post links, but what about the spammers*if any* who post abusing remarks against ruby?
adding a whole isp to a probation list can lead to full scale lockout. I have dozens of proxies from many ISPs. I like that idea though. hmmm spam trap....
Are you saying to set up a trigger for people who post to a certain page?
No. It goes further. You set up a trigger to recognize vandals by IP address. You push their changes to an alternate database. They can see the site, they can make changes, they can see their changes on the site.
No one except the other vandals sees their changes. And every night, everything the vandals do is washed away.
Cheers,
Patrick
Superb idea Patrick. Very interesting. I think that is the better idea out of all. Hmm.. I guess there could be multiple ways of detecting spammers.
- regex
- trigger page
- morons who try to post 4 in under 10 seconds
- spam detection as other mail filters implement
What other good ways would there be to detect spammers?
--dross
ยทยทยท
On Tuesday, September 28, 2004, at 07:23 PM, David Ross wrote:
On Tuesday, September 28, 2004, at 10:23 PM, Austin Ziegler wrote:
I still do alot of cleanup on my wikis, and I still use Deny
directives. Nothing replaces an active administrator. The tarpit just
gave me another lever to help me manage the problem.
As of right now, a tarpit would actually be a little too difficult to
implement in Ruwiki. It's much easier to present the wiki as if it
were a CMS or a read-only website.
This is the best reason to choose one tactic over another. It's your time the spammers are wasting. No need to help them out by trying to do something difficult
You have been, hard to believe Set up scanners as well for common and the uncommon ports.
Rubyforge obviously hasn't been using an RBL That ip was a first try hit for me on I think it was spamhaus
First Rubygarden Spam email
-----------------------------------------
The rubygarden wiki has been over-run with spam links.
220.163.37.233 is one of the offending source IP addresss.
I fixed the home page, and then saw the extent of the crap. Looks like many personal pages have been altered.
Those with user pages may want to go check their own page to assist with the clean up.
James
-----------------------------------------
-------------------------------------
I've got a list, but it has become obvious that maintaining a list
manually isn't going to work. I'm tempted to require registration and
authentication at this point as much as I hate the thought.
Are you sure? I only checked 2 of the spammer IPs, and they are both blacklisted on rbls.org (61.50.242.197 and 220.163.37.233).
Anyway, I think the spam problem would be quite easy to handle if there was a better interface for rollback and IP blocking. I have never seen a Mediawiki wiki flooded with spam, because it needs far more effort to spam it than to repair it.
Umm... why not try to educate rather then accuse. I for one would certainly
like to know that in the hell you're talking about, but you're not explaining
yourself very well.
T.
ยทยทยท
On Monday 25 October 2004 08:50 pm, David Ross wrote:
Chad Fowler wrote:
>On Tue, 26 Oct 2004 04:56:53 +0900, David Ross <dross@code-exec.net> wrote:
>>You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
>>spamming solved.
>>Most, if not ALL the ips listed in
>>http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs
>
>We have been. For months.
>
>>Thanks, have a nice day. Problem solved
>
>Unfortunately not.
First Rubygarden Spam email
-----------------------------------------
The rubygarden wiki has been over-run with spam links.
220.163.37.233 is one of the offending source IP addresss.
I fixed the home page, and then saw the extent of the crap. Looks like
many personal pages have been altered.
Those with user pages may want to go check their own page to assist with
the clean up.
James
-----------------------------------------
-------------------------------------
I've got a list, but it has become obvious that maintaining a list
manually isn't going to work. I'm tempted to require registration and
authentication at this point as much as I hate the thought.