Capcha in ruby

Hello,
I'm creating a public registration process in rails and I would like to add some kind of spam filtering...
Anyone knows about a library/implementation/gem of the capcha anti-spam in ruby or better already integrated into rails (validates_capcha)?

thanks

-Federico

This http://www.ruby-doc.org/core/classes/Method.src/M000116.html might be
of help.

Remember, though, that CAPTCHAs are NOT accessible, and you may need
to provide an alternate means for verification that is as secure.
CAPTCHAs are also imperfect and have been shown to be able to be
broken by computer programs.

-austin

···

On 9/19/05, Federico <pix@yahoo.it> wrote:

Hello,
I'm creating a public registration process in rails and I would like to
add some kind of spam filtering...
Anyone knows about a library/implementation/gem of the capcha anti-spam
in ruby or better already integrated into rails (validates_capcha)?

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

try

http://www.google.co.uk/search?q=ruby+captcha

or better yet

http://captcha.rubyforge.org/

I don't understand the relevance of the unbind method linked to above. I'm
assuming it was a mistake. If I'm being dense, will someone explain what the
relevance is to me :slight_smile:

···

On 9/19/05, Lyndon Samson <lyndon.samson@gmail.com> wrote:

This http://www.ruby-doc.org/core/classes/Method.src/M000116.html might be
of help.

But they stop the average spambot, which is what they're for I think.
The simplest accessible alternative would be email verification, but this
obviously slows the whole thing down.
Has anyone thought of an accessible alternative that can be embedded on the
page?

···

On 9/19/05, Austin Ziegler <halostatue@gmail.com> wrote:

On 9/19/05, Federico <pix@yahoo.it> wrote:
> Hello,
> I'm creating a public registration process in rails and I would like to
> add some kind of spam filtering...
> Anyone knows about a library/implementation/gem of the capcha anti-spam
> in ruby or better already integrated into rails (validates_capcha)?

Remember, though, that CAPTCHAs are NOT accessible, and you may need
to provide an alternate means for verification that is as secure.
CAPTCHAs are also imperfect and have been shown to be able to be
broken by computer programs.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Two things. First, they don't work against the average spambot of 2005
or later. The average spambot of 2005 has gocr or something like that
built in. (I'm using spambot because that's what you're using; what
you're talking about is actually crawlers and registration bots.) The
problem with CAPTCHA systems is that something complex enough to
defeat a computer OCR system will be enough to lock out a significant
portion of your potential users. Second, a lot of people *have*
thought about it. I'm unimpressed with most solutions.

http://www.standards-schmandards.com/index.php?2005/01/01/11-captcha

http://www.w3.org/WAI/intro/captcha.php
http://www.bestkungfu.com/archive/date/2005/01/captcha-state-of-the-union-2005/
http://www.bestkungfu.com/?p=445

Basically, my advice is to forget CAPTCHA and go with double
verification. You can even provide multiple levels of user
accessibility, allowing immediate access but nothing that could be
construed as spam until they have verified their identity in some way
that is accessible.

-austin

···

On 9/19/05, Robbie Carlton <robbie.carlton@gmail.com> wrote:

But they stop the average spambot, which is what they're for I think.
The simplest accessible alternative would be email verification, but this
obviously slows the whole thing down.
Has anyone thought of an accessible alternative that can be embedded on the
page?

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

I have seen the case where a subscriber is asked to solve a simple
math problem. E.g., "what is twelve plus twenty-three?" This would
certainly be accessible. You could think of different types of
questions like "Enter the number that follows fifty-five." or "What
number comes before thirty-two?"

···

On 9/19/05, Robbie Carlton <robbie.carlton@gmail.com> wrote:

But they stop the average spambot, which is what they're for I think.
The simplest accessible alternative would be email verification, but this
obviously slows the whole thing down.
Has anyone thought of an accessible alternative that can be embedded on the
page?

On 9/19/05, Austin Ziegler <halostatue@gmail.com> wrote:
>
> On 9/19/05, Federico <pix@yahoo.it> wrote:
> > Hello,
> > I'm creating a public registration process in rails and I would like to
> > add some kind of spam filtering...
> > Anyone knows about a library/implementation/gem of the capcha anti-spam
> > in ruby or better already integrated into rails (validates_capcha)?
>
> Remember, though, that CAPTCHAs are NOT accessible, and you may need
> to provide an alternate means for verification that is as secure.
> CAPTCHAs are also imperfect and have been shown to be able to be
> broken by computer programs.
>
> -austin
> --
> Austin Ziegler * halostatue@gmail.com
> * Alternate: austin@halostatue.ca
>
>

Sorry, my bad, response to wrong thread!

···

On 9/19/05, Robbie Carlton <robbie.carlton@gmail.com> wrote:

try

ruby captcha - Google Search

or better yet

http://captcha.rubyforge.org/

I don't understand the relevance of the unbind method linked to above. I'm
assuming it was a mistake. If I'm being dense, will someone explain what
the
relevance is to me :slight_smile:

Robbie Carlton wrote:

But they stop the average spambot, which is what they're for I think.
The simplest accessible alternative would be email verification, but this
obviously slows the whole thing down.
Has anyone thought of an accessible alternative that can be embedded on the
page?

How about asking the user to respond by decoding a "spam speak" encoded
word or question. (On the plus side, if the scheme fails and you can
get access to the algorithm to decode it, you can use that as a spam
email filtering test... :wink:

Interesting. That would probably keep out existing general-purpose rakes. But the moment your site becomes popular or targeted, it seems to me that it would not be difficult to write a program to answer your questions. Even if you include 33 flavors of how to phrase the question ("Enter an integer that is not less than (not equal to) eighty (reduced by the value represented by the roman numeral V) and more 'n seventy with the number of non-thumbs on a standard hand added to it.") the engineered bot could be written to handle 20% of your phrases, and that would be enough.

[OT]
I smell a couple of fun Ruby Quizzes here. One is simply to write an english-to-numeric processor.
value = Numeric.from_english( "eight-hundred thousand, twenty-three hundred fifteen")

Another quiz might be to write such a challenge/response captcha system. Make the questions as clear and varied as possible.

Another might be, given a series of questions like the above, to write a 'bot that could answer them.

···

On Sep 19, 2005, at 6:36 AM, Stephen Veit wrote:

I have seen the case where a subscriber is asked to solve a simple
math problem. E.g., "what is twelve plus twenty-three?" This would
certainly be accessible. You could think of different types of
questions like "Enter the number that follows fifty-five." or "What
number comes before thirty-two?"

Austin Ziegler <halostatue@gmail.com> writes:

But they stop the average spambot, which is what they're for I think.
The simplest accessible alternative would be email verification, but this
obviously slows the whole thing down.
Has anyone thought of an accessible alternative that can be embedded on the
page?

Two things. First, they don't work against the average spambot of 2005
or later. The average spambot of 2005 has gocr or something like that
built in. (I'm using spambot because that's what you're using; what
you're talking about is actually crawlers and registration bots.) The
problem with CAPTCHA systems is that something complex enough to
defeat a computer OCR system will be enough to lock out a significant
portion of your potential users. Second, a lot of people *have*
thought about it. I'm unimpressed with most solutions.

OCR isn't *that* easy. Humans--even young children--far exceed
machines in discerning even relatively clean machine-print characters.

http://www.standards-schmandards.com/index.php?2005/01/01/11-captcha
CAPTCHA Codes are not Accessible
Introduction to “Inaccessibility of CAPTCHA” | Web Accessibility Initiative (WAI) | W3C
Kung-Fu-Best - IT-specialists meeting
Kung-Fu-Best - IT-specialists meeting

The research at Lehigh is interesting.

http://www.cse.lehigh.edu/~baird/research_hips.html

Basically, my advice is to forget CAPTCHA and go with double
verification. You can even provide multiple levels of user
accessibility, allowing immediate access but nothing that could be
construed as spam until they have verified their identity in some way
that is accessible.

-austin

I guess you're talking about email, but that is considerably less
difficult for a machine to pass than CAPTCHA. Verifying that some
thing that gave you an email address has the ability to view messages
sent to that address doesn't prove much.

Steve

···

On 9/19/05, Robbie Carlton <robbie.carlton@gmail.com> wrote:

Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

I like the maths question idea. Is there any set of enumerable problems that
would cause a human no difficulties, for which there isn't a general
solution algorithm?
I realise this is a maths question, but just wondered if anyone had any
thoughts.

···

On 9/19/05, Gavin Kistner <gavin@refinery.com> wrote:

On Sep 19, 2005, at 6:36 AM, Stephen Veit wrote:
> I have seen the case where a subscriber is asked to solve a simple
> math problem. E.g., "what is twelve plus twenty-three?" This would
> certainly be accessible. You could think of different types of
> questions like "Enter the number that follows fifty-five." or "What
> number comes before thirty-two?"

Interesting. That would probably keep out existing general-purpose
rakes. But the moment your site becomes popular or targeted, it seems
to me that it would not be difficult to write a program to answer
your questions. Even if you include 33 flavors of how to phrase the
question ("Enter an integer that is not less than (not equal to)
eighty (reduced by the value represented by the roman numeral V) and
more 'n seventy with the number of non-thumbs on a standard hand
added to it.") the engineered bot could be written to handle 20% of
your phrases, and that would be enough.

[OT]
I smell a couple of fun Ruby Quizzes here. One is simply to write an
english-to-numeric processor.
value = Numeric.from_english( "eight-hundred thousand, twenty-three
hundred fifteen")

Another quiz might be to write such a challenge/response captcha
system. Make the questions as clear and varied as possible.

Another might be, given a series of questions like the above, to
write a 'bot that could answer them.

[...]

OCR isn't *that* easy. Humans--even young children--far exceed
machines in discerning even relatively clean machine-print characters.

Yes, I understand that. However, CAPTCHA is also proving to be
relatively ineffective and against accessibility standards. If you have
to follow US Federal 508 guidelines, you shouldn't use CAPTCHA. As noted
on the various discussions that I linked to, the large sites that
spawned CAPTCHA have now abandoned it.

[...]

The research at Lehigh is interesting.
Henry Baird's Research on HIPs and CAPTCHAs

Interesting, but I believe it will be ultimately fruitless. If I am
visually impaired but do not, for example, have audio attached to my
computer, then an audio CAPTCHA is just as limiting as a visual CAPTCHA.
Even the logic puzzle CAPTCHAs -- the most promising of CAPTCHAs -- are
often culturally or linguistically exclusive.

Basically, my advice is to forget CAPTCHA and go with double
verification. You can even provide multiple levels of user
accessibility, allowing immediate access but nothing that could be
construed as spam until they have verified their identity in some way
that is accessible.

I guess you're talking about email, but that is considerably less
difficult for a machine to pass than CAPTCHA. Verifying that some
thing that gave you an email address has the ability to view messages
sent to that address doesn't prove much.

Not necessarily email. Google has solved this for GMail and Google Talk
with SMS, as the number of people who own computers and the number of
people who own cellphones has a high correspondence.

Other systems can solve it with multiple levels of privilege. If you
have a bulletin board, then someone who has signed up but not yet
verified might have command set X (maybe posting new messages to the
support forum once every four hours and replies to any forum once every
fifteen minutes). After they've verified, they might have the base
restrictions lifted and get command set X + Y (posting new messages
to any forum every thirty minutes, replies every five minutes). After
they've participated on the site for ten days continuously or thirty
days sporadically, they get full posting and reply priveleges. Or maybe
they don't get PM capabilities until thirty days.

CAPTCHA don't work nearly as well as people think and they're
inaccessible. There is a reason that Ruwiki will never support them.

-austin

···

On 9/20/05, Steven Lumos <steven@lumos.us> wrote:
--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

If your eyesight is normal. There are LOTS of people who do not have normal
eyesight.

<anecdote>
My wife holds a driver's license (and drives), is an EMT, and can pretty much
do anything anyone else does, but captchas are difficult for her because of
her eyesight. If a normal set of eyes are likened to monitor resultions, and
are said to be 1280x1024, her eyes are like an 800x600 or 640x480 monitor.
It's still crisp and clear, but the resolution is lower. That alone makes
many styles of captcha difficult for her.
</anecdote>

For a large chunk of the population, captchas are quick and easy and fine, but
for a small but not insignificant chunk, they are difficult to impossible.
And even if a machine can only handle 10% of captchas, that is a high enough
percentage to render them ineffective, IMHO.

Kirk Haines

···

On Tuesday 20 September 2005 1:28 pm, Steven Lumos wrote:

OCR isn't *that* easy. Humans--even young children--far exceed
machines in discerning even relatively clean machine-print characters.

Well, taking that thought further, you could even keep simpletons (or at least those bad at math) out of your site, by setting your question difficulty to an appropriate level.

"Type the number that comes right before seventy-five."

"What is fifteen plus twelve minus six?"

"What is six x minus three q, if q is two and x three?"

"What is two squared, cubed?"

"Is the cosine of zero one or zero?"

"What trigonometric function of an angle of a right triangle yields the ratio of the adjacent side's length divided by the hypotenuse?"

"What is the derivative of 2x^2, when x is 3?"

"What is the dot product of the vectors [4 7] and [3 4]?"

"What is the cross product of the vectors [4 7] and [3 4]?"

...and I'll stop there, before I embarrass myself trying to come up with tougher questions.

···

On Sep 19, 2005, at 7:11 AM, Robbie Carlton wrote:

I like the maths question idea. Is there any set of enumerable problems that
would cause a human no difficulties, for which there isn't a general
solution algorithm?
I realise this is a maths question, but just wondered if anyone had any
thoughts.

I have good eye site and they still annoy me. I can't believe adding a user hostile feature is a good idea. :slight_smile:

Stay tuned for this week's Ruby Quiz though, which covers this very topic.

James Edward Gray II

···

On Sep 20, 2005, at 4:07 PM, Austin Ziegler wrote:

Yes, I understand that. However, CAPTCHA is also proving to be
relatively ineffective and against accessibility standards. If you have
to follow US Federal 508 guidelines, you shouldn't use CAPTCHA. As noted
on the various discussions that I linked to, the large sites that
spawned CAPTCHA have now abandoned it.

Austin Ziegler wrote:

[...]

OCR isn't *that* easy. Humans--even young children--far exceed
machines in discerning even relatively clean machine-print characters.
   

Yes, I understand that. However, CAPTCHA is also proving to be
relatively ineffective and against accessibility standards. If you have
to follow US Federal 508 guidelines, you shouldn't use CAPTCHA. As noted
on the various discussions that I linked to, the large sites that
spawned CAPTCHA have now abandoned it.

That's interesting to me. I don't follow either subject much, but I'm very interested in website accessibility. Is the ticketmaster way of providing either a visual or an aural CAPTCHA not sufficient?

Interesting, but I believe it will be ultimately fruitless. If I am
visually impaired but do not, for example, have audio attached to my
computer, then an audio CAPTCHA is just as limiting as a visual CAPTCHA.

Would a large-print CAPTCHA suffice in this case? People who are too visually impaired to read large print would have to have audio, I'd assume.

CAPTCHA don't work nearly as well as people think and they're
inaccessible. There is a reason that Ruwiki will never support them.

I do agree that they are a big pain in the butt to deal with, as a user. Your multiple-level thing seems like the right approach. Yahoo Groups offers the option for admins to moderate new users only -- after which the admin can manually give you full posting rights.

Devin

···

On 9/20/05, Steven Lumos <steven@lumos.us> wrote:

Kirk Haines wrote:

And even if a machine can only handle 10% of captchas, that is a high enough percentage to render them ineffective, IMHO.

Not if
1. the processing time to attempt a captcha response is prohibitively large
2. the system forces a prohibitively large delay after an unsuccessful captcha attempt.

But again, I don't like 'em either. Just setting facts straight. :slight_smile:

Devin

I strongly suspect that a lot of wiki related spam (and probably comment spam
as well) has a lot of manual oversite in the process. When I was closely
monitoring the RubyGarden wiki spam, I noticed times when the spammer would
go back and correct typos in the spam text. They may use automation, but
there is a human often directing the process. And if there is a human in the
process, then captcha seems a wasted effort.

Does anyone have first hand reports of success with captcha?

···

On Tuesday 20 September 2005 05:07 pm, Austin Ziegler wrote:

Yes, I understand that. However, CAPTCHA is also proving to be
relatively ineffective [...]

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Austin Ziegler wrote:

[...]
> OCR isn't *that* easy. Humans--even young children--far exceed
> machines in discerning even relatively clean machine-print characters.

Yes, I understand that. However, CAPTCHA is also proving to be
relatively ineffective and against accessibility standards. If you have
to follow US Federal 508 guidelines, you shouldn't use CAPTCHA. As noted
on the various discussions that I linked to, the large sites that
spawned CAPTCHA have now abandoned it.

I don't disagree with this in theory. I missed the part about the
"sites that spawned CAPTCHA", but I did just verify that both Hotmail
and Yahoo are still using them.

> The research at Lehigh is interesting.
> Henry Baird's Research on HIPs and CAPTCHAs

Interesting, but I believe it will be ultimately fruitless. If I am
visually impaired but do not, for example, have audio attached to my
computer, then an audio CAPTCHA is just as limiting as a visual CAPTCHA.
Even the logic puzzle CAPTCHAs -- the most promising of CAPTCHAs -- are
often culturally or linguistically exclusive.

>> Basically, my advice is to forget CAPTCHA and go with double
>> verification. You can even provide multiple levels of user
>> accessibility, allowing immediate access but nothing that could be
>> construed as spam until they have verified their identity in some way
>> that is accessible.
> I guess you're talking about email, but that is considerably less
> difficult for a machine to pass than CAPTCHA. Verifying that some
> thing that gave you an email address has the ability to view messages
> sent to that address doesn't prove much.

Not necessarily email. Google has solved this for GMail and Google Talk
with SMS, as the number of people who own computers and the number of
people who own cellphones has a high correspondence.

I disagree with the implications that (a) people with visual
imparements have easy access to SMS, and (b) software doesn't have easy
access to SMS. I'm not exactly sure what Google thinks they are doing
with SMS, aside from tying your phone number to your search history,
but I do know that it is fundamentally different from curbing wiki and
blog spam. I don't claim to be completely up on the economics of wiki
spam, but I can certainly imagine the existance of cheapish pre-pay
cell phones that have USB/IR/Bluetooth connectivity, and who cares if
that one number is blocked after the fact.

Other systems can solve it with multiple levels of privilege. If you
have a bulletin board, then someone who has signed up but not yet
verified might have command set X (maybe posting new messages to the
support forum once every four hours and replies to any forum once every
fifteen minutes). After they've verified, they might have the base
restrictions lifted and get command set X + Y (posting new messages
to any forum every thirty minutes, replies every five minutes). After
they've participated on the site for ten days continuously or thirty
days sporadically, they get full posting and reply priveleges. Or maybe
they don't get PM capabilities until thirty days.

But it's the verification step that you've devoted only 3 words to
that's hard. Your scheme, taken as a whole, might sound reasonable for
a forum, but doesn't seem really practical for blog comments or wikis.
I'm certain that Google has not solved the problem. Sufficient albeit
fewer numbers of people will walk through the Google process in
exchange for pornography just like they do with CAPTCHAs.

CAPTCHA don't work nearly as well as people think and they're
inaccessible. There is a reason that Ruwiki will never support them.

I don't want to sound like a big proponant of CAPTCHA. I've never even
implemented one. I was just drawn in by the claim that free OCR
programs were cracking them with any success. I do think they may be a
part of a solution in certain situations, and that the alternatives so
far have equal problems with accessability.

Steve

···

On 9/20/05, Steven Lumos <steven@lumos.us> wrote: