[QUIZ] Math Captcha (#48)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Gavin Kistner

Overview
--------

  "What is fifty times 'a', if 'a' is three?" #=> 150

Write a Captcha system that uses english-based math questions to distinguish
humans from bots.

Background and Details
----------------------

A 'captcha' is an automated way for a computer (usually a web server) to try to
weed out bots, which don't have the 'intelligence' to answer a question that's
fairly straightforward for a human. Captcha is an acronym for "Completely
Automated Public Turing-test to tell Computers and Humans Apart"

The most common form of captcha is an image that has been munged in such a way
that it's supposed to be very hard for a computer to tell you what it says, but
easy for a human. Recent studies (or at least articles) claim that it's proven
quite possible to write OCR software that determines the right answer a large
percentage of the time. Although image-based captchas can be improved (for
example, by placing multiple colored words and asking the user to identify the
'cinnamon' word), they still have the fatal flaw of being inaccessible to the
visually impaired.

This quiz is to write a different kind of captcha system - one that asks the
user questions in plain text. The trick is to use mathematics questions with a
variety of forms and varying numbers, so that it should be difficult (though of
course not impossible) to write a bot to parse them.

For example, this questions of this form might be easy for a bot to parse:

  "What is five plus two?"

while this question should be substantially harder:

  "How much is fifteen-hundred twenty-three, less the amount of non-thumbs on
  one human hand?"

A good balance between human comprehension, variety of question form, and ease
of computation is an interesting challenge.

The Rules
---------

Write a module that has two module methods: 'create_question' and
'check_answer'.

The create_question method should return a hash with two specific keys:

  p MathCaptcha.create_question
  #=> { :question => "Here is the string of the question", :answer_id => 17 }

The check_answer method is passed a hash with two specific keys, and should
return true if the supplied answer is correct, false if not.

  p MathCaptcha.check_answer :answer => "Answer String", :answer_id => 17
  #=> false

Extra Credit
------------

1) Ensure that your library is easily extensible, making it easy for someone
using your library to add new forms of question creation, and the answer that
goes with each form.

2) For automated testing and non-ruby usage, it would be nice to provide your
module with a command-line wrapper, with the following interface:

  > ruby math_captcha.rb
  1424039 : What is the sum of the number of thumbs on a human and the number
  of hooves on a horse?
  
  > ruby math_captcha.rb --id 1424039 --answer 7
  false
  
  > ruby math_captcha.rb --id 1424039 --answer 6
  true

3) Allow your 'create_question' method to take an integer difficulty argument.
Low difficulties (0 being the lowest) represent trivial questions that an
elementary school student might be able to answer, while higher difficulties
range into algebra, trigonometry, calculus, linear algebra, and beyond. (It's up
to you as to what the scale is.)

  "Type the number that comes right before seventy-five."
  "What is fifteen plus twelve minus six?"
  "What is six x minus three i, if I said that i is two and x three?"
  "What is two squared, cubed?"
  "Is the cosine of zero one or zero?"
  "What trigonometric function of an angle of a right triangle yields the
  ratio of the adjacent side's length divided by the hypotenuse?"
  "What is the derivative of 2x^2, when x is 3?"
  "What is the dot product of the vectors [4 7] and [3 4]?"
  "What is the cross product of the vectors [4 7] and [3 4]?"

4) Let your 'check_answer' method take a unique identifier (such as an IP
address) along with the answer, and always return false for that identifier
after a certain number of consecutive (or accumulated) failures have occurred.

A Tip - Generating English Numerals
-----------------------------------

Presumably parsing large numbers from english to computerese adds another
stumbling block for any bot writer. ("5 + 2" is slightly easier than "five plus
two" and a fair amount easier than "three-thousand-'n'-five twenty three plus
five-oh-five".

Ruby Quiz #25 has some nice code that you can appropriate for turning integers
into english: http://www.rubyquiz.com/quiz25.html

"What is fifty times 'a', if 'a' is three?" #=> 150

Write a Captcha system that uses english-based math questions to
distinguish
humans from bots.

Solutions to the past quiz entitled English Numerals may be handy for this
one:

http://rubyquiz.com/quiz25.html

Cheers,
Dave

My solution follows. I didn't do the extra credit for checking to see if the same UserID/IP was spamming the system. I also didn't do the extra credit for passing an argument for the difficulty of the question. Instead, I created a framework where you categorize types of captchas in an hierarchy, and you can ask for a specific type of captcha by using the desired subclass.

For example, in my code below, I have:
class Captcha::Zoology < Captcha ... end
class Captcha::Math < Captcha
   class Basic < Math ... end
   class Algebra < Math ... end
end

This allows you to do:
Captcha.create_question # a question from any framework, while
Captcha::Zoology.create_question # only questions in this class
Captcha::Math.create_question # any question in Math or its subclasses
Captcha::Math::Basic.create_question # only Basic math questions

I'm not wild about the fact that I re-create the Marshal file after every question creation or remove-retrieval, but it seemed the safest way. I have no idea how this will work (or fail) in a multi-threaded environment. I do like that I have the marshal file yank out questions after a certain time limit, and (optionally) after the answer has been checked. This keeps the marshal file quite tiny. The persistence for AnswerStore could easily be abstracted out to use a DB instead, if available.

I'm not wild about some of the specific captcha questions I created; some of them seem to be annoyingly hard at times or (rarely) confusing. But this framework makes it pretty easy to modify the question generation, and add your own.

I'm most proud of the String#variation method (except the name). Using regexp-like notation, it performs a sort of reverse-regexp, building a random string based on some criteria. (I'm not a golfer, but I also like how terse it turned out.)

Without further explanation, the code:

class Captcha
   # Invalidate an answer as soon as it has been checked for?
   REMOVE_ON_CHECK = true

   # Returns a hash with two values:
   # _question_:: A string with the question that the user should answer
   # _answer_id_:: A unique ID for this question that should be passed to
   # #check_answer or #get_answers
   def self.create_question
     question, answers = factories.random.call
     answer_id = AnswerStore.instance.store( answers )
     return { :question => question, :answer_id => answer_id }
   end

   # _answer_id_:: The unique ID returned by #create_question
   # _answer_:: The user's string or numeric answer to the question
   def self.check_answer( info )
     #TODO - implement userid persistence and checks
     answer_id = info[ :answer_id ]
     answer = info[ :answer ].to_s.downcase

     store = AnswerStore.instance
     valid_answers = if REMOVE_ON_CHECK
       store.remove( answer_id )
     else
       store.retrieve( answer_id )
     end
     valid_answers = valid_answers.map{ |a| a.to_s.downcase }

     valid_answers.include?( answer )
   end

   def self.get_answers( id )
     warn "Hey, that's cheating!"
     AnswerStore.instance.retrieve( id )
   end

   # Add the block to my store of question factories
   def self.add_factory( &block )
     ( @factories ||= [] ) << block
   end

   # Keep track of the classes that inherit from me
   def self.inherited( subklass )
     ( @subclasses ||= [] ) << subklass
   end

   # All the question factories in myself and subclasses
   def self.factories
     @factories ||= []
     @subclasses ||= []
     @factories + @subclasses.map{ |sub| sub.factories }.flatten
   end

   class AnswerStore
     require 'singleton'
     include Singleton

     FILENAME = 'captcha_answers.marshal'
     MINUTES_TO_STORE = 10

     def initialize
       if File.exists?( FILENAME )
         @all_answers = File.open( FILENAME ){ |f| Marshal.load( f ) }
       else
         @all_answers = { :lastid=>0 }
       end

       # Purge any answers that are too old, both for security and
       # to keep a small log size
       @all_answers.delete_if { |id,answer|
         next if id == :lastid
         ( Time.now - answer.time ) > MINUTES_TO_STORE * 60
       }

       warn "#{@all_answers.length} answers previously stored" if $DEBUG
     end

     # Serialize the answer(s), and return a unique ID for it
     def store( *answers )
       idx = @all_answers[ :lastid ] += 1
       @all_answers[ idx ] = Answer.new( *answers )
       serialize
       idx
     end

     # Retrieve the correct answer(s)
     def retrieve( answer_id )
       answers = @all_answers[ answer_id ]
       ( answers && answers.possibilities ) || []
     end

     # Manually clear out a stored answer

···

#
     # Returns the answer if it exists in the store, an empty array otherwise
     def remove( answer_id )
       answers = retrieve( answer_id )
       @all_answers.delete( answer_id )
       serialize
       answers
     end

     private
       # Shove the current store state to disk
       def serialize
         File.open( FILENAME, 'wb' ){ |f| f << Marshal.dump( @all_answers ) }
       end

     class Answer
       attr_reader :possibilities, :time
       def initialize( *possibilities )
         @possibilities = possibilities.flatten
         @time = Time.now
       end
     end
   end
end

class String
   def variation( values={} )
     out = self.dup
     while out.gsub!( /\(([^())?]+)\)(\?)?/ ){
       ( $2 && ( rand > 0.5 ) ) ? '' : $1.split( '|' ).random
     }; end
     out.gsub!( /:(#{values.keys.join('|')})\b/ ){ values[$1.intern] }
     out.gsub!( /\s{2,}/, ' ' )
     out
   end
end

class Array
   def random
     self[ rand( self.length ) ]
   end
end

class Integer
   ONES = %w[ zero one two three four five six seven eight nine ]
   TEENS = %w[ ten eleven twelve thirteen fourteen fifteen
              sixteen seventeen eighteen nineteen ]
   TENS = %w[ zero ten twenty thirty forty fifty
              sixty seventy eighty ninety ]
   MEGAS = %w[ none thousand million billion ]

   # code by Glenn Parker;
   # see http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/135449
   def to_english
     places = to_s.split(//).collect {|s| s.to_i}.reverse
     name = []
     ((places.length + 2) / 3).times do |p|
       strings = Integer.trio(places[p * 3, 3])
       name.push(MEGAS[p]) if strings.length > 0 and p > 0
       name += strings
     end
     name.push(ONES[0]) unless name.length > 0
     name.reverse.join(" ")
   end

   def to_digits
     self.to_s.split('').collect{ |digit| digit.to_i.to_english }.join('-')
   end

   def to_rand_english
     rand < 0.5 ? to_english : to_digits
   end

   private

   # code by Glenn Parker;
   # see http://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/135449
   def Integer.trio(places)
     strings = []
     if places[1] == 1
       strings.push(TEENS[places[0]])
     elsif places[1] and places[1] > 0
       strings.push(places[0] == 0 ? TENS[places[1]] :
                    "#{TENS[places[1]]}-#{ONES[places[0]]}")
     elsif places[0] > 0
       strings.push(ONES[places[0]])
     end
     if places[2] and places[2] > 0
       strings.push("hundred", ONES[places[2]])
     end
     strings
   end

end

# Specific captchas follow, showing off categorization
class Captcha::Zoology < Captcha
   add_factory {
     q = "How many (wings|exhaust pipes|titanium teeth|TVs|wooden knobs) "
     q << "does a (standard|normal|regular) "
     q << "(giraffe|cat|bear|dog|frog|cow|elephant) have?"
     [ q.variation, '0', 'zero', 'none' ]
   }
   add_factory {
     q = "How many (wings|legs|eyes) does a (standard|normal|regular) "
     q << "(goose|bird|chicken|rooster|duck|swan) have?"
     [ q.variation, 2, 'two' ]
   }
end

class Captcha::Math < Captcha
   class Basic < Math
     add_factory {
       q = "(How (much|many)|What) is (the (value|result) of)? "
       q << ":num1 :op :num2?"
       num1 = rand( 90 ) + 9
       num2 = rand( 30 ) + 2

       plus = 'plus:added to:more than'.split(':')
       minus = 'minus:less:taking away'.split(':')
       times = 'times:multiplied by:x'.split(':')
       op = [plus,minus,times].flatten.random
       case true
         when plus.include?( op )
           answer = num1 + num2
         when minus.include?( op )
           answer = num1 - num2
         when times.include?( op )
           answer = num1 * num2
       end
       num1 = num1.to_rand_english
       num2 = num2.to_rand_english
       [ q.variation( :num1 => num1, :op => op, :num2 => num2 ), answer ]
     }
     add_factory {
       num1 = rand( 990000 ) + 1000
       num2 = rand( 990000 ) + 1000
       answer = num1 + num2
       num1 = num1.to_rand_english
       num2 = num2.to_rand_english
       [ "Add #{num1} (and|to) #{num2}.".variation, answer ]
     }
   end
   class Algebra < Math
     add_factory {
       q = "Calculate :n1:x :op :n2:y, (for|if (I say )?) "
       q << ":x( is (set to )?|=):xV(,| and) :y( is (set to )?|=):yV."
       n1 = rand( 20 ) + 9
       n2 = rand( 10 ) + 2
       x = %w|a x z r q t|.random
       y = %w|c i y s m|.random
       xV = rand( 5 )
       yV = rand( 6 )

       plus = 'plus:added to:more than'.split(':')
       minus = 'minus:less:taking away'.split(':')
       times = 'times:multiplied by:x'.split(':')
       op = [plus,minus,times].flatten.random
       case true
         when plus.include?( op )
           answer = n1*xV + n2*yV
         when minus.include?( op )
           answer = n1*xV - n2*yV
         when times.include?( op )
           answer = n1*xV * n2*yV
       end
       xV = xV.to_rand_english
       yV = yV.to_rand_english
       vars = { :n1=>n1,:op=>op,:n2=>n2,:x=>x,:y=>y,:xV=>xV,:yV=>yV }
       [ q.variation( vars ), answer ]
     }
   end
end

if __FILE__ == $0
   if ARGV.empty?
     q = Captcha::Math.create_question
     puts "#{q[ :answer_id ]} : #{q[ :question ]}"
   else
     pieces = {}
     nextarg = nil
     ARGV.each{ |arg|
       case arg
         when /-i|--id/i then nextarg = :id
         when /-a|--answer/i then nextarg = :answer
         else pieces[ nextarg ] = arg
       end
     }

     pieces = { :answer_id => pieces[:id], :answer => pieces[:answer] }
     puts Captcha.check_answer( pieces )
   end
end

Hello,

It's a bit out of topic, but I wanted to express an idea I had today and that I've never seen anywhere.

From what I've read on the Internet and in various news groups, Captcha is considered as the ultimate method against computer bots. Like the ones who fill your blog or wiki with unrelated links, and by that manner rank their sites on Google.

The power of captcha, is to give some information that is easy to understand for an human and very hard for a computer. That way, you can differenciate both of them.

Now, to get to the point. IMHO what was not considered, it a fake site, that would proxy the target's captcha. It would provide enough content, so that an unexperienced user would want to reply to the captcha. That way, it uses the human's brain :stuck_out_tongue: to fulfill it's "work".

What do you think ?

Cheers,
... zimba

The quiz says that if you read a bit farther down. :wink:

James Edward Gray II

···

On Sep 23, 2005, at 7:56 AM, Dave Burt wrote:

"What is fifty times 'a', if 'a' is three?" #=> 150

Write a Captcha system that uses english-based math questions to
distinguish
humans from bots.

Solutions to the past quiz entitled English Numerals may be handy for this
one:

Ruby Quiz - English Numerals (#25)

It's been proposed before. See, for example, this:

http://www.boingboing.net/2004/01/27/solving_and_creating.html

regards,
Ed

···

On Mon, Sep 26, 2005 at 07:21:48AM +0900, zimba wrote:

Now, to get to the point. IMHO what was not considered, it a fake site,
that would proxy the target's captcha. It would provide enough content,
so that an unexperienced user would want to reply to the captcha. That
way, it uses the human's brain :stuck_out_tongue: to fulfill it's "work".

My solution is below. Here are my random thoughts about it:

1. My code requires Glenn Parker's Ruby Quiz #25 solution. I did make one minor change to it, which was to wrap the quiz specific code in a:

if __FILE__ == $0
     # ...
end

That allowed me to use it as a library.
2. I did the first two extra credits, but couldn't think of good systems for the last two.
3. I'm in a "Write less code" kick this week, so I really tried to pack a lot of punch into minimal code. I'm happy with the results.

Here's the code:

#!/usr/local/bin/ruby -w

require "erb"

# Glenn Parker's code from Ruby Quiz 25...
require "english_numerals"
class Integer
     alias_method :to_en, :to_english
end

class Array
     def insert_at_nil( obj )
         if i = index(nil)
             self[i] = obj
             i
         else
             self << obj
             size - 1
         end
     end
end

module MathCaptcha
     @@captchas = Array.new
     @@answers = Array.new

     def self.add_captcha( template, &validator )
         @@captchas << Array[template, validator]
     end

     def self.create_question
         raise "No captchas loaded." if @@captchas.empty?

         captcha = @@captchas[rand(@@captchas.size)]

         args = Array.new
         class << args
             def arg( value )
                 push(value)
                 value
             end

             def resolve( template )
                 ERB.new(template).result(binding)
             end
         end
         question = args.resolve(captcha.first)
         index = @@answers.insert_at_nil(Array[captcha.first, *args])

         Hash[:question => question, :answer_id => index]
     end

     def self.check_answer( answer )
         raise "Answer id required." unless answer.include? :answer_id

         template, *args = @@answers[answer[:answer_id]]
         raise "Answer not found." if template.nil?

         validator = @@captchas.assoc(template).last
         raise "Unable to match captcha." if validator.nil?

         if validator[answer[:answer], *args]
             @@answers[answer[:answer_id]] = nil
             true
         else
             false
         end
     end

     def self.load_answers( file )
         @@answers = File.open(file) { |answers| Marshal.load(answers) }
     end

     def self.load_captchas( file )
         code = File.read(file)
         eval(code, binding)
     end

     def self.save_answers( file )
         File.open(file, "w") { |answers| Marshal.dump(@@answers, answers) }
     end
end

if __FILE__ == $0
     captchas = File.join(ENV["HOME"], ".math_captchas")
     unless File.exists? captchas
         File.open(captchas, "w") { |file| file << DATA.read }
     end
     MathCaptcha.load_captchas(captchas)

     answers = File.join(ENV["HOME"], ".math_captcha_answers")
     MathCaptcha.load_answers(answers) if File.exists? answers

     if ARGV.empty?
         question = MathCaptcha.create_question
         puts "#{question[:answer_id]} : #{question[:question]}"
     else
         args = Hash.new
         while ARGV.size >= 2 and ARGV.first =~ /^--\w+$/
             key = ARGV.shift[2..-1].to_sym
             value = ARGV.first =~ /^\d+$/ ? ARGV.shift.to_i : ARGV.shift
             args[key] = value
         end

         answer = MathCaptcha.check_answer(args)
         puts answer
     end

     END { MathCaptcha.save_answers(answers) }
end

__END__
add_captcha(
     "<%= arg(rand(10)).to_en.capitalize %> plus <%= arg(2).to_en %>?"
) do |answer, *opers|
     if answer.is_a?(String) and answer =~ /^\d+$/
         answer = answer.to_i.to_en
     elsif answer.is_a?(Integer)
         answer = answer.to_en
     end
     answer == opers.inject { |sum, var| sum + var }.to_en
end

__END__

James Edward Gray II

···

On Sep 25, 2005, at 2:55 PM, Gavin Kistner wrote:

My solution follows.

Ruby Quiz - English Numerals (#25)

The quiz says that if you read a bit farther down. :wink:

:slight_smile: I got down to Extra Credit #3 before thinking I'm not getting that much
extra credit...

Also I don't think my submitted solution to that quiz had any code relevant
to this exercise, although I have it sitting around in sub-release
condition.

Wish me luck finding time to put something together.

Cheers,
Dave

My solution follows.

My solution is below. Here are my random thoughts about it:

[snip 1 - 3]

4. I add captchas in plain Ruby code. A method is available in the templates called arg(), to ensure an argument is passed on to your validation block. Here's a sample captcha:

add_captcha(
    "<%= arg(rand(10)).to_en.capitalize %> plus <%= arg(2).to_en %>?"
) do |answer, *opers|
    if answer.is_a?(String) and answer =~ /^\d+$/
        answer = answer.to_i.to_en
    elsif answer.is_a?(Integer)
        answer = answer.to_en
    end
    answer == opers.inject { |sum, var| sum + var }.to_en
end

James Edward Gray II

···

On Sep 25, 2005, at 10:04 PM, James Edward Gray II wrote:

On Sep 25, 2005, at 2:55 PM, Gavin Kistner wrote: