[QUIZ] Genetic Programming (#212)

%w[rubygems charlie].each{|lib| require lib}
class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}], [:-@], [:+,:*,:-])
  DATA = [[8, 16, 20808] , [22, 31, 150847], [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
          [20, 33, 181525], [31, 1, -119] , [19, 33, 181433], [0,
12, 8640], [13, 12, 9017]]
  SIZE_PENALTY = 100
  def fitness
    -DATA.inject(0){|f,(x,y,z)| f + ( z - eval_genes(:x=>x,:y=>y)
).abs } - size * SIZE_PENALTY
  end
end

Population.new(Quiz212).evolve_on_console(200)

The best result I got using size penalty 100 has fitness -2296 and tree:
[:*, [:term, :y], [:+, [:+, [:term, :y], [:term, :x]], [:*, [:*,
[:term, -5], [:term, :y]], [:-@, [:term, :y]]]]]
Which simplifies to 5y^3 + y^2 + xy.

Not including a penalty for size in the fitness function causes bloat.
However, including such a penalty causes it to never find the lower
order terms as the smaller size is considered more important than
slightly better accuracy. This is a well-known problem in GP.

···

On Fri, Jul 3, 2009 at 8:51 PM, Daniel Moore<yahivin@gmail.com> wrote:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can!
Visit: http://rubyquiz.strd6.com/suggestions

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Genetic Programming (#212)

Buon giorno Rubyists,

Let's say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming[1]. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we'll have evolved a strong solution.

The modifications made to the programs are based on techniques
observed in biological evolution: mutation and crossover. In mutation
a piece of one program is altered randomly, for example `x + 3` could
become `x + (y - 1)`. The mutation occurs at one node in the parse
tree and replaces it with a new randomly generated node.

Crossover works in much the same way, except instead of randomly
altering the node it is taken from another node of another program.
For example: `x + (y * x)` and `3 - (y + x*x)` could yield something
like `(y*x) - (y + x*x)`.

In order to mutate our programs it would be nice to work with the
structure of the code directly. The ParseTree[2] gem provides a way of
getting an Abstract Syntax Tree[3] to manipulate. The AST represents
the code as S-expressions[4], arrays containing operators as the first
argument and the operands as the remaining arguments. If you are
familiar with LISP then you know that LISP programmers program
directly in S-expressions. Using S-expressions makes evolution and
manipulation of the programs much easier, as S-expressions are the
essence of programs.

When manipulating your ASTs you may need a deep copy to prevent
unwanted side effects Here is a deep copy idiom that may be useful:

def deep_copy(tree)
Marshal.load(Marshal.dump(tree)
end

In order to get the AST back into executable ruby you'll need
ruby2ruby[5] or something like it. As always, feel encouraged to use
and share other tools if you feel they are good for the task. Also, if
you have any questions don't be afraid to ask, there are many people
on the mailing list who are happy to help!

Here is the table of outputs for the mystery function that we would
like to evolve an algorithm for:
["x", "y", "result"]
[8, 16, 20808]
[22, 31, 150847]
[5, 16, 20685]
[12, 19, 34895]
[18, 25, 79349]
[20, 33, 181525]
[31, 1, -119]
[19, 33, 181433]
[0, 12, 8640]
[13, 12, 9017]

In order to determine how well suited a program is we can use the
following metric function:

# Return how far off from the data the given method is
# A lower score is better, a score of 0 matches the data perfectly
def fitness(program, data)
delta = 0
data.each do |row|
value = program.calculate(row[0], row[1])
delta += (value - row[2]).abs
end
return delta
end

Have Fun!

[1]: Genetic programming - Wikipedia
[2]: http://rubyforge.org/projects/parsetree/
[3]: Abstract syntax tree - Wikipedia
[4]: S-expression - Wikipedia
[5]: http://seattlerb.rubyforge.org/ruby2ruby/

--
-Daniel
http://rubyquiz.strd6.com

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Genetic Programming (#212)

Buon giorno Rubyists,

Let's say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming[1]. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we'll have evolved a strong solution.

Some time ago, I ran across DRP, Directed Ruby Programming[1], and
thought that might be something to look at later; I thought this quiz
might be a bit of a chance. (Though I only barely scratched the surface
of the library's documentation.) First, here are some sample functions
my programs generated (fitness: function):

89416: 4312 + y * 7 * y * x
82779: ((x * 4 * (93 + y * 6 + x + 9 - x)) - 3) * y / 8 - 62 + 1 * x - 3
       - 80 + 1395 + 0 / 870 + y * 6 + y - y - ((y)) - x - y + y - 16
       - 657 + 880 - 7 * (6 / 6) * (y / 1) - 870 + 19 + y * y * (y) - y
       - (62935) / (((x - x) - y / 1 * y / y) - x) - ((y + 4 - y + y * y
       + x)) + 6 - x * 6 * (((8 + 6 / 2563 * (y) + x / 9871671)) - y - 7
       - x - x - 8) - 3361 + 9 - (43) / 675670 - y - x - 45 + 8 + (y / 1)
       - 870 + 19 + y * y * (y) - y + y * x * (6 / 6) + 870 + 19 + y + x
       * x - ((((10)))) - 8 + y - (x)
17387: x + (y) * 5 * y * y - ((x + y)) / 65 / y / y + y + 2 / (y * x +
       95381 - x + x) - y * 0 / 50 * x * ((x)) * 7 - x * 4 + 4 * 9 * 6 *
       y - 0 * (((y + x + x))) * (x + y) * (y + 78 * ((x / y * (y + 22 -
       9) - ((x + y) - 684) - x - y - x + ((7 + y * ((785 / 4 - x * 8)) +
       (8 - 4) + 8 / 7 + x + x))) / 4) - (8 - 9 * x - 0) * x + 2) * 91 *
       76 * x + 220 + x * 4 - 2 * ((x)) * (y) * 4 - y * ((x / (y * 34 + y
       + 4 - x * 8)) + 24) + y * 1 - (y) * (((y + x + x)))
8570: 5 * y * + + y * y - 29
8424: y * + 5 * + y * y + - x
8360: 5 * y * y * y / y * y - x / - - - y
8346: 5 * y * y * y - - - 1 + x - x
8338: y * ((5 * y)) * ((y))
8102: (8) - y + 60 + y * y * 5 * y
8056: 47 + 5 * y * y * y
7288: 988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - - 7
3948: y * 5 * (y + (y) * y - x)

Following the first example in drp's INTRO file, I created a simple class
to generate random initial functions:

require 'drp'

class ProgramGenerator
  extend DRP::RuleEngine

  begin_rules
    def program
      "Class.new{ #{function} }"
    end

    def function
      "def calculate(x, y); #{expression}; end"
    end

    def expression; "#{expression} #{binaryop} #{expression}"; end
    def expression; "#{atom} #{binaryop} #{expression}"; end
    def expression; "#{expression} #{binaryop} #{atom}"; end
    def expression; "(#{expression})"; end
    def expression; "#{atom}"; end

    def binaryop; "+"; end
    def binaryop; "-"; end
    def binaryop; "*"; end
    def binaryop; "/"; end

    def unaryop; "-"; end

    def atom; "#{variable}"; end
    def atom; "#{literal}"; end

    def variable; "x"; end
    def variable; "y"; end

    def literal; "#{rand(10)}"; end
    def literal; "#{rand(9) + 1}#{literal}"; end
  end_rules
end

irb(main):001:0> require 'program_generator'
=> true
irb(main):002:0> pg = ProgramGenerator.new
...
irb(main):003:0> pg.expression
=> "x"
irb(main):004:0> pg.expression
=> "(y * 4 + 924)"
irb(main):006:0> pg.function
=> "def calculate(x, y); 26 / x + (60 - x) + y; end"
irb(main):007:0> pg.program
=> "Class.new{ def calculate(x, y); 136 * x + 14; end }"

I setup a "magic" hash to store my population of programs and a few
helpful methods:

Programs = Hash.new{|hash, program|
  hash[program] = fitness(program)
}

class <<Programs
  def generate_some_more(n = 100, pg = ProgramGenerator.new)
    n.times{ self[pg.expression] }
    self
  end

  def reject_the_unfit
    self.delete_if{|program, fit| !fit}
  end

  def select_the_best(n = 5)
    self.sort_by_fitness[0, n]
  end

  def sort_by_fitness(dont_randomize = true)
    self.reject_the_unfit
    self.sort_by{|program, fitness| dont_randomize ? fitness : rand(fitness)}
  end
end

irb(main):001:0> Programs
=> {}
irb(main):002:0> Programs.generate_some_more(2)
=> {"y"=>687122, "(x - 4 * x / 31 + (0 * 964 / 65 / y - 8 + 4 + (y * ((y / x *
54 - (x / ((986 - y + x - 24 + ((6 / 197 / 29 - 287)) / 9 + y * 2 / 2 * 7)) *
23 - 7) + x - 44 * 659 + (y * 1 - 8 / y / ((5 * x)) - 5) * 7632 / ((y)) / x /
8856))) / x + x) - ((x))) - 7 - 0 - x"=>nil}
irb(main):003:0> Programs.select_the_best(1)
=> [["y", 687122]]

And wrapped that in a loop to generate/mutate a bunch of functions:

pm = ProgramMutator.new

1000.times do
  Programs.generate_some_more.
           sort_by_fitness(true).
           each_cons(2).
           each do |(p1, f1), (p2, f2)|
    new_program = case rand(10)
      when 3..9 then pm.cross_breed(p1, p2)
      when 2 then pm.mutate(p2)
      when 1 then pm.mutate(p1)
      else ProgramGenerator.new.expression
      end
    new_fitness = Programs[new_program]
    #puts "#{f1} X #{f2}\t#{new_fitness}"
  end

  program, fitness = Programs.select_the_best(1).flatten
  puts "Best: #{fitness}\t#{program[0..60]}"
end

ruby1.9 -rubygems 212.rb

Best: 569018 28298
Best: 494580 3170 * x - y
Best: 494580 3170 * x - y
Best: 494580 3170 * x - y
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 212330 (y) * - 142 / - 25 * y * x
Best: 212330 (y) * - 142 / - 25 * y * x
Best: 187980 0 / y - - 8 * 25 * y * x
Best: 173870 248 * y * (x)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 92312 8526 - x + y * + 8 * y * x
^C

That leaves the ProgramMutator class:

class ProgramMutator
  def cross_breed(p1, p2)
    #...
  end

  def mutate(program)
    #...
  end
end

The machine I'm using today doesn't have ParseTree ("ParseTree doesn't work
with ruby 1.9.0 (LoadError)"). So, I was just doing string manipulations.
I don't know which functions produced which of the results from above, but
a few samples:

  def cross_breed(p1, p2)
    p1 + %{w + - * /}[rand(4)] + p2
  end

  def cross_breed(p1, p2)
    s1 = p1[0..rand(p1.size)]
    s2 = p2[rand(p2.size)..-1]

    s1 + %w{ + - * / }[rand(4)] + s2
  end

  def mutate(program)
    program.gsub(
      "#{%w{ + - * / }[rand(4)]}",
      "#{%w{ + - * / }[rand(4)]}"
    )
  end

[1] http://drp.rubyforge.org/

Sander Land's solution used Charlie[1] a library for genetic
algorithms and genetic programming. Let's take a look at Sander's
solution:

    %w[rubygems charlie].each{|lib| require lib}
    class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}], [:-@], [:+,:*,:-])
      DATA = [[8, 16, 20808], [22, 31, 150847], [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
             [20, 33, 181525], [31, 1, -119], [19, 33, 181433], [0,
12, 8640], [13, 12, 9017]]
      SIZE_PENALTY = 100

      def fitness
        -DATA.inject(0) do |f,(x,y,z)|
          f + ( z - eval_genes(:x=>x,:y=>y)).abs
        end - size * SIZE_PENALTY
      end
    end

Here Sander creates a class named `Quiz212`. `Quiz212` extends
TreeGenotype, provided by the Charlie gem. TreeGenotype takes three
arguments: `terminals, unary_ops, binary_ops`. These are used to
construct the tree representing the program. The class also defines a
fitness function which is used to evaluate the programs generated by
TreeGenotype. The data, as well as a size penalty are stored as
constants within the class.

The Charlie gem also provides a `Population` class. The population's
initializer takes a genotype class as a parameter and uses that to
evolve the population. The population also provides an
`evolve_on_console` method that displays the current best program each
generation.

    Population.new(Quiz212).evolve_on_console(200)

Charlie looks like a great way to get started trying out genetic
programming. The TreeGenotype is just one of many possible options
available in the library. Please do check it out.

Brabuhr provided a solution using Directed Ruby Programming[2]. DRP
uses Grammatical Evolution[3] as a foundation for Genetic Programming.
DRP let's you define a grammar to constrain the kinds of programs that
are generated. Here's an example of defining an expression using
`DRP::RuleEngine`:

    def expression; "#{expression} #{binaryop} #{expression}"; end

We can also define a binary operator in a similar way:

    def binaryop; "/"; end

These rules, along with any others that we define, are used by the
rule engine as a grammar to generate candidate programs. brabuhr
collects these generated programs and applies a fitness function to
see how well their output matches the data. These programs are then
sorted and mutated/crossed to create the next generation of programs.
Be sure to take a look at brabuhr's full solution for more details.

This week we saw two interesting options for working with genetic
programming, Charlie, and DRP. Both of these libraries make working
with genetic programming much easier by providing an interface to the
common tools and techniques used.

And now what you've all been waiting for... the mystery function:

    def mystery(x, y)
      3 * x * y + 5 * (y**3) - 7 * x
    end

Thank you Sander and brabuhr for your solutions to this week's quiz!

[1]: http://charlie.rubyforge.org/
[2]: http://drp.rubyforge.org/
[3]: http://en.wikipedia.org/wiki/Grammatical_evolution

212.tar.gz (2.72 KB)

I let it run over night:

3804: y * y + y * + y * + + - + - y * 5

···

On Thu, Jul 9, 2009 at 4:34 PM, <brabuhr@gmail.com> wrote:

some sample functions my programs generated (fitness: function):

89416: ...
8338: y * ((5 * y)) * ((y))
8102: (8) - y + 60 + y * y * 5 * y
8056: 47 + 5 * y * y * y
7288: 988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - - 7
3948: y * 5 * (y + (y) * y - x)

And now what you've all been waiting for... the mystery function:

def mystery(x, y)
3 * x * y + 5 * (y**3) - 7 * x
end

Cool. I was not thinking and I rebooted while my script and output
were in /tmp; the machine is set to clear tmp on boot: so, I lost my
lowest-delta functions. But, I recovered these:

cat foo

DATA = [[8, 16, 20808], [22, 31, 150847], [5, 16, 20685], [12, 19, 34895],
        [18, 25, 79349], [20, 33, 181525], [31, 1, -119] , [19, 33, 181433],
        [0, 12, 8640], [13, 12, 9017]]

# An example of one of the closer ones I found:

f = "(y) * + x - - - - + x - x - 3 * 82 / y + - 5 + + + x - y * y + y
- 2 * (x + y) / 459 + + - + - - 5 + (y) - 2 + - + 5 + (y) / + 5 - - -
+ + - + x - 3 * 8 + x - + (y) * 5 - x - - x + (y) * 5 + y * (y) / (y)
* y * (y) * 5 + y * (y) * 5 / 3 / (y) * - - + x - 3 + (y) * y / (y) *
y"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =
eval(f)}\t#{delta = (ev - av).abs}"; delta}.inject{|s,v| v += s}
puts

# I think this was the second best I found; unfortunately, I did not save
# the original output. The version here was after some hand editing while
# pasting it into a spreadsheet, e.g.:
# "x + + + - + y" => "x - y"; "x + - + - + y" => "x + y"; "(x)" => "x"

f = "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5
+ 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =
eval(f)}\t#{delta = (ev - av).abs}"; delta}.inject{|s,v| v += s}
puts

# I added "- 5" to the above:

f = "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5
+ 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x - 5"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =
eval(f)}\t#{delta = (ev - av).abs}"; delta}.inject{|s,v| v += s}
puts

ruby foo

8 16 20808 20816 8
22 31 150847 150842 5
5 16 20685 20681 4
12 19 34895 34916 21
18 25 79349 79369 20
20 33 181525 181526 1
31 1 -119 -128 9
19 33 181433 181435 2
0 12 8640 8602 38
13 12 9017 9057 40
148

8 16 20808 20810 2
22 31 150847 150859 12
5 16 20685 20681 4
12 19 34895 34902 7
18 25 79349 79361 12
20 33 181525 181530 5
31 1 -119 -78 41
19 33 181433 181436 3
0 12 8640 8625 15
13 12 9017 9028 11
112

8 16 20808 20805 3
22 31 150847 150854 7
5 16 20685 20676 9
12 19 34895 34897 2
18 25 79349 79356 7
20 33 181525 181525 0
31 1 -119 -83 36
19 33 181433 181431 2
0 12 8640 8620 20
13 12 9017 9023 6
92

···

On Tue, Jul 14, 2009 at 11:49 PM, Daniel Moore<yahivin@gmail.com> wrote: