RubyGarden Spam

James_Edward_Gray_II · 28 September 2004 13:13

I'm glad to see you're on the lookout for ideas, James. I haven't seen your quiz topic submission yet.

I have seen your posts with links to the site though, so I'll forgive you for not being the first.

James Edward Gray II

···

On Sep 28, 2004, at 7:39 AM, James Britt wrote:

If the spam is entered by a script, then the wiki code should be able to use some simple heuristics to block the most annoying crap.

For example, if the diff from the old page to the new page is greater than some percentage, or if the new page contains X number of links to the same site.

Make this Ruby Quiz #2

Gavin_Sinclair · 28 September 2004 13:22

Robert McGovern wrote:

You should create a way to generate images with text
verification. This would eliminate spam.

I think it would slow them down but it wouldn't eliminate them completely.

If the spam is entered by a script, then the wiki code should be able to
use some simple heuristics to block the most annoying crap.

For example, if the diff from the old page to the new page is greater
than some percentage, or if the new page contains X number of links to
the same site.

X number of links to _any_ site should be good enough. Automatic spam
could then go fine-grained to get under the radar, at which time a
secondary heuristic is needed, or X becomes zero. Your statement
below provides ample justification for that.

Might this cause a problem for legit users once in a while? Sure. But
we have that now, with spam clean-up.

Gavin

···

On Tuesday, September 28, 2004, 10:39:59 PM, James wrote:

Austin_Ziegler5 · 28 September 2004 15:59

That's more or less the idea behind an "entropy" value that gets saved
on a page right now -- I haven't figured out exactly what I'm going to
do with it, but it offers interesting possibilities.

-austin

···

On Tue, 28 Sep 2004 21:39:59 +0900, James Britt <jamesunderbarb@neurogami.com> wrote:

If the spam is entered by a script, then the wiki code should be able to
use some simple heuristics to block the most annoying crap.

For example, if the diff from the old page to the new page is greater
than some percentage, or if the new page contains X number of links to
the same site.

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations

Austin_Ziegler5 · 28 September 2004 16:00

That works for Ruby developers' wikis, but not for the general case.
Although my current "clients" for Ruwiki are all developers, I intend
to aim it a bit wider.

-austin

···

On Tue, 28 Sep 2004 21:37:04 +0900, Dave Thomas <dave@pragprog.com> wrote:

How about displaying a trivial line of Ruby code and asking the user to
enter the value. Something like

To stop spammers, please enter the value of the following

1.+(2) = | |

Change the + to a - or * randomly, and pick random numbers between 1
and 9

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations

David_Ross · 28 September 2004 23:23

Patrick May wrote:

Hello,

The only way to stop wiki spam is to have a dedicated admin. Creativity helps reduce the time burden, but it is a constant > endeavor.

A tarpit would be easier to implement than a captcha. In the usemod settings, you use NetAddr::IP to check if the env's Remote Addr is within a known spammer domain. If it is a spammer, set the pages database to a copy. Nightly / weekly / whatever, dump the latest pages directory on top of the tarpit.

I said domain. I meant subnet. You can just put a whole isp on probation and not allow changes from it to be propagated to the main database.

Cheers,

Patrick

adding a whole isp to a probation list can lead to full scale lockout. I have dozens of proxies from many ISPs. I like that idea though. hmmm spam trap....

Are you saying to set up a trigger for people who post to a certain page?

--dross

Patrick_May1 · 29 September 2004 00:36

Austin,

A tarpit would be easier to implement than a captcha. In the usemod
settings, you use NetAddr::IP to check if the env's Remote Addr is
within a known spammer domain. If it is a spammer, set the pages
database to a copy. Nightly / weekly / whatever, dump the latest pages
directory on top of the tarpit.

There goes one of my points for my presentation

The main resource in fighting spammers is time. You want to waste
their time, let them think that things are working.

I'm approaching it, again, from a slightly different perspective. My
goal is to make the page seem as if it were entirely a read-only
website to robots, and 403 if they are known bad crawlers. I don't yet
have IP banning, but I have robot exclusion.

Read-only to robots makes sense as a way of preventing accidental problems. I used to have a delete link on the wiki. All my pages kept getting deleted. I guessed that it was a robot gone amuck [1] . I also like the bit about recognizing bad crawlers. No harvesting for old fashioned spam is a good thing.

The thing about banning is that it is easy for the vandal to tell that they have been detected. I tried using Apache Deny directives to manage abuse, but sometimes that just encourages the vandal to switch computers. Plus the cost of a false positive is denial of service. After one particularly annoying episode, I realized that the vandal was trying to waste my time. So I setup the tarpit system to waste his, and haven't lost sleep since.

I still do alot of cleanup on my wikis, and I still use Deny directives. Nothing replaces an active administrator. The tarpit just gave me another lever to help me manage the problem.

Cheers,

Patrick

1. I didn't labor to much over it, I just deleted the Delete link.

···

On Tuesday, September 28, 2004, at 08:15 PM, Austin Ziegler wrote:

On Wed, 29 Sep 2004 08:14:42 +0900, Patrick May <patrick@hexane.org> > wrote:

Chad_Fowler4 · 6 October 2004 11:58

Great, thanks! Now I've just got to find the time to insert it and
test it. Hopefully some time this afternoon I can steal a few
minutes..

Much appreciated, Patrick!
Chad

···

On Wed, 6 Oct 2004 11:53:55 +0900, Patrick May <patrick@hexane.org> wrote:

Hello,

On Tuesday, September 28, 2004, at 07:14 PM, Patrick May wrote:

> Hello,
>
> On Tuesday, September 28, 2004, at 12:39 AM, David Ross wrote:
>
>> You should create a way to generate images with text
>> verification. This would eliminate spam.
>
> The only way to stop wiki spam is to have a dedicated admin.
> Creativity helps reduce the time burden, but it is a constant > endeavor.
>
> A tarpit would be easier to implement than a captcha. In the usemod
> settings, you use NetAddr::IP to check if the env's Remote Addr is
> within a known spammer domain. If it is a spammer, set the pages
> database to a copy. Nightly / weekly / whatever, dump the latest
> pages directory on top of the tarpit.

I threw together tarpit logic for usemod:

# == Configuration ====================================
use NetAddr::IP;
use vars qw( $TarpitDir $VandalFile );

$DataDir = "/tmp/mywikidb"; # Main wiki directory
$TarpitDir = "/tmp/tarpitdb"; # tarpit dir
$VandalFile = "/Users/patsplat/Desktop/usemod10/vandals.txt";

open(SOURCE, "< $VandalFile")
     or die "Couldn't open $VandalFile for reading: $!\n";
my $remote_addr = new NetAddr::IP $ENV{"REMOTE_ADDR"};
while(<SOURCE>) {
     my $vandal_host = new NetAddr::IP $_;
     if ( $remote_addr->within( $vandal_host ) ) {
        $DataDir = $TarpitDir;
     }
}

T_Onoma · 28 September 2004 13:00

Here's an idea.

I was considering the potential of moderation. And I also recalled someone
else pointing out that spammers are interested in one thing: external links
--and they had suggested we just get rid of external links altogether. Both
are too much. But then it hit me: combine the two!

If a page edit adds an external link, then the page has to be
approved by moderator.

T.

Patrick_May1 · 28 September 2004 23:27

No. It goes further. You set up a trigger to recognize vandals by IP address. You push their changes to an alternate database. They can see the site, they can make changes, they can see their changes on the site.

No one except the other vandals sees their changes. And every night, everything the vandals do is washed away.

Cheers,

Patrick

···

On Tuesday, September 28, 2004, at 07:23 PM, David Ross wrote:

adding a whole isp to a probation list can lead to full scale lockout. I have dozens of proxies from many ISPs. I like that idea though. hmmm spam trap....

Are you saying to set up a trigger for people who post to a certain page?

Austin_Ziegler5 · 29 September 2004 02:23

I wrote:

> I'm approaching it, again, from a slightly different perspective. My
> goal is to make the page seem as if it were entirely a read-only
> website to robots, and 403 if they are known bad crawlers. I don't yet
> have IP banning, but I have robot exclusion.

Patrick May:

Read-only to robots makes sense as a way of preventing accidental
problems. I used to have a delete link on the wiki. All my pages kept
getting deleted. I guessed that it was a robot gone amuck [1] . I
also like the bit about recognizing bad crawlers. No harvesting for
old fashioned spam is a good thing.

The thing about banning is that it is easy for the vandal to tell that
they have been detected. I tried using Apache Deny directives to
manage abuse, but sometimes that just encourages the vandal to switch
computers. Plus the cost of a false positive is denial of service.
After one particularly annoying episode, I realized that the vandal was
trying to waste my time. So I setup the tarpit system to waste his,
and haven't lost sleep since.

I still do alot of cleanup on my wikis, and I still use Deny
directives. Nothing replaces an active administrator. The tarpit just
gave me another lever to help me manage the problem.

As of right now, a tarpit would actually be a little too difficult to
implement in Ruwiki. It's much easier to present the wiki as if it
were a CMS or a read-only website.

-austin

···

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 6 ] Gmail invitations

David_Ross · 25 October 2004 19:56

Here is one step of which many could be applied.
Mr. Britt you said in a message a while ago that the IP address 220.163.37.233 attacked Rubygarden. Here is the ultimate solution to step a good percentage of the spammers. I really didnt think about it at fist until I was setting up my RBL lists on servers..

Use this site to check for address. Make sure you send to the admins of the RBL servers that you are using the servers or you could get blacklisted from access.

This is *the* solution

RBLs are not only for mail.

I use a very big list of RBLs all the time. Remember never to use dul.dnsbl.sorbs.net

You *can* integrate this into wiki's. Its very easy. Okay thanks, 80% spamming solved.
Most, if not ALL the ips listed in http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs

Use them!

Thanks, have a nice day. Problem solved

David Ross

···

--
Hazzle free packages for Ruby?
RPA is available from http://www.rubyarchive.org/

Curt_Hibbs3 · 28 September 2004 13:08

trans. (T. Onoma) wrote:

Here's an idea.

I was considering the potential of moderation. And I also
recalled someone
else pointing out that spammers are interested in one thing:
external links
--and they had suggested we just get rid of external links
altogether. Both
are too much. But then it hit me: combine the two!

If a page edit adds an external link, then the page has to be
approved by moderator.

That's a very good idea!

The spammers typical add a hundred or so external links to a page. So, requiring approval for more than, say, two external links on a page would ease the burden on legitimate users, while limiting spammers.

Curt

David_Ross · 28 September 2004 13:12

trans. (T. Onoma) wrote:

Here's an idea.

I was considering the potential of moderation. And I also recalled someone else pointing out that spammers are interested in one thing: external links --and they had suggested we just get rid of external links altogether. Both are too much. But then it hit me: combine the two!

If a page edit adds an external link, then the page has to be
approved by moderator.

T.

This would certainly throttle the spammers who post links, but what about the spammers*if any* who post abusing remarks against ruby?

-dross

David_Ross · 28 September 2004 23:37

Patrick May wrote:

adding a whole isp to a probation list can lead to full scale lockout. I have dozens of proxies from many ISPs. I like that idea though. hmmm spam trap....

Are you saying to set up a trigger for people who post to a certain page?

No. It goes further. You set up a trigger to recognize vandals by IP address. You push their changes to an alternate database. They can see the site, they can make changes, they can see their changes on the site.

No one except the other vandals sees their changes. And every night, everything the vandals do is washed away.

Cheers,

Patrick

Superb idea Patrick. Very interesting. I think that is the better idea out of all. Hmm.. I guess there could be multiple ways of detecting spammers.

- regex
- trigger page
- morons who try to post 4 in under 10 seconds
- spam detection as other mail filters implement

What other good ways would there be to detect spammers?

--dross

···

On Tuesday, September 28, 2004, at 07:23 PM, David Ross wrote:

Patrick_May1 · 29 September 2004 02:38

Austin,

···

On Tuesday, September 28, 2004, at 10:23 PM, Austin Ziegler wrote:

I still do alot of cleanup on my wikis, and I still use Deny
directives. Nothing replaces an active administrator. The tarpit just
gave me another lever to help me manage the problem.

As of right now, a tarpit would actually be a little too difficult to
implement in Ruwiki. It's much easier to present the wiki as if it
were a CMS or a read-only website.

This is the best reason to choose one tactic over another. It's your time the spammers are wasting. No need to help them out by trying to do something difficult

Cheers,

Patrick

Chad_Fowler4 · 26 October 2004 00:31

You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
spamming solved.
Most, if not ALL the ips listed in
http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs

We have been. For months.

Thanks, have a nice day. Problem solved

Unfortunately not.

···

On Tue, 26 Oct 2004 04:56:53 +0900, David Ross <dross@code-exec.net> wrote:

--

Chad Fowler
http://chadfowler.com

http://rubygems.rubyforge.org (over 20,000 gems served!)

David_Ross · 26 October 2004 00:39

Chad Fowler wrote:

···

On Tue, 26 Oct 2004 04:56:53 +0900, David Ross <dross@code-exec.net> wrote:

You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
spamming solved.
Most, if not ALL the ips listed in
http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs

We have been. For months.

Thanks, have a nice day. Problem solved

Unfortunately not.

You have been, hard to believe Set up scanners as well for common and the uncommon ports.
Rubyforge obviously hasn't been using an RBL That ip was a first try hit for me on I think it was spamhaus

David Ross
--
Hazzle free packages for Ruby?
RPA is available from http://www.rubyarchive.org/

David_Ross · 26 October 2004 00:50

Chad Fowler wrote:

···

On Tue, 26 Oct 2004 04:56:53 +0900, David Ross <dross@code-exec.net> wrote:

You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
spamming solved.
Most, if not ALL the ips listed in
http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs

We have been. For months.

Thanks, have a nice day. Problem solved

Unfortunately not.

First Rubygarden Spam email
-----------------------------------------
The rubygarden wiki has been over-run with spam links.

220.163.37.233 is one of the offending source IP addresss.

I fixed the home page, and then saw the extent of the crap. Looks like many personal pages have been altered.

Those with user pages may want to go check their own page to assist with the clean up.

James

-----------------------------------------

-------------------------------------

I've got a list, but it has become obvious that maintaining a list
manually isn't going to work. I'm tempted to require registration and
authentication at this point as much as I hate the thought.

Chad

-------------------------------------

http://rbls.org/?q=220.163.37.233

You're not reading the email..

Thanks for lying, its listed since June 2003

No, problem is 80% solved. There are some actual unlogged IPs. Please educate yourself in security, you obviously aren't qualified.

David Ross
--
Hazzle free packages for Ruby?
RPA is available from http://www.rubyarchive.org/

Andreas_S2 · 26 October 2004 02:44

Chad Fowler wrote:

···

On Tue, 26 Oct 2004 04:56:53 +0900, David Ross <dross@code-exec.net> wrote:

You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
spamming solved.
Most, if not ALL the ips listed in
http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs

We have been. For months.

Are you sure? I only checked 2 of the spammer IPs, and they are both blacklisted on rbls.org (61.50.242.197 and 220.163.37.233).

Anyway, I think the spam problem would be quite easy to handle if there was a better interface for rollback and IP blocking. I have never seen a Mediawiki wiki flooded with spam, because it needs far more effort to spam it than to repair it.

T_Onoma · 26 October 2004 02:12

Umm... why not try to educate rather then accuse. I for one would certainly
like to know that in the hell you're talking about, but you're not explaining
yourself very well.

T.

···

On Monday 25 October 2004 08:50 pm, David Ross wrote:

Chad Fowler wrote:
>On Tue, 26 Oct 2004 04:56:53 +0900, David Ross <dross@code-exec.net> wrote:
>>You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
>>spamming solved.
>>Most, if not ALL the ips listed in
>>http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs
>
>We have been. For months.
>
>>Thanks, have a nice day. Problem solved
>
>Unfortunately not.

First Rubygarden Spam email
-----------------------------------------
The rubygarden wiki has been over-run with spam links.

220.163.37.233 is one of the offending source IP addresss.

I fixed the home page, and then saw the extent of the crap. Looks like
many personal pages have been altered.

Those with user pages may want to go check their own page to assist with
the clean up.

James

-----------------------------------------

-------------------------------------

I've got a list, but it has become obvious that maintaining a list
manually isn't going to work. I'm tempted to require registration and
authentication at this point as much as I hate the thought.

Chad

-------------------------------------

220.163.37.233 - rbls.org

You're not reading the email..

Thanks for lying, its listed since June 2003

No, problem is 80% solved. There are some actual unlogged IPs. Please
educate yourself in security, you obviously aren't qualified.

Topic		Replies	Views
RubyForge has been slow today because ruby-talk	75	212	27 October 2004
RubyGarden wiki patch ruby-talk	32	123	30 October 2004
RubyGarden Spam ruby-talk	12	83	28 September 2004
New spam at the wiki ruby-talk	40	166	12 November 2004
Wiki Spam Report ruby-talk	10	100	14 December 2004

RubyGarden Spam

Related topics