[SUMMARY] Secret Santas (#2)

Please allow me a quick clarification, before I talk about this week's quiz.
Joe Cheng asked:

  Will you accept solutions that just print out the e-mails
  instead of sending them? :slight_smile:

My answer to this is simply that you may submit whatever you like to a Ruby
Quiz. There's no right and wrong answers here. However, when I define a
specification in a quiz, that's what I'm prepared to benchmark, test and
summarize. Please understand if I do not include altered solutions in those
processes. (This has been added to the Ruby Quiz FAQ.)

Let's move on to the problem at hand.

As many have now pointed out, this problem is trickier than it first appears.
The first solution that came to my mind when playing with this problem was
simple:

  1. Choose a name from the list.
  2. Filter all matching surnames out of the Santa list.
  3. Assign a random Santa from the list in step 2, to the name in step 1.
  4. Repeat until all names are assigned...

As I quickly learned, that doesn't quite work. The easiest way to see why it
doesn't work is to consider three families, one member each:

  Skywalker can be assigned as a Santa for Portokalos.
  Portokalos can then be assigned as the Santa for Skywalker.
  And then we're stuck! There are no matches left for Brigman or whoever.

Depending on how you implement the above, you'll probably get incorrect output
or get stuck in an infinite loop at this point. That was when I personally
began to take this problem seriously, and look at other options. Let's see what
others found.

The submitted solutions are more or less variations on three major themes:

  Random Sorts
  Circular Lists and/or Family Grouping
  "Hill Climbing" Algorithms

Let's examine each of those in turn, starting with random sorts. Here's the
code from my own submission that handles this:

  $players = STDIN.readlines.map { |line| line.chomp }
  $santas = $players.sort_by { rand }
  
  def check_families?
    $players.each_with_index do |who, i|
      return false if who[/\S+ </] == $santas[i][/\S+ </]
    end
    return true
  end
  
  $santas = $players.sort_by { rand } until check_families?

The idea there is simple, keep randomly shuffling $santas until we can verify
that none of the match-ups share the same surname. That ensures no one will
have themself or common family members, of course. This method does give us a
good random shuffle of the match-ups, but unfortunately, the performance is far
from stellar.

Still, in a Secret Santa selection script, how critical is performance? This
type of answer could be a viable solution in this case, though it usually isn't
in the majority of "Real World" problems.

The techniques in the second category of solutions, circular lists/family
groupings, were quite varied. One solution built a circular linked list,
separated family members in that list, and then assigned Santas straight down.
Another tactic, used in more than one submission, was to separate the input into
family groups, choose a member from the most populated family, and assign a
Santa from the next highest populated family, until the list was exhausted.

Algorithms like this are far more efficient that my random sort above,
critically so in pathological cases. They do make a trade off though. These
types of assignments are not unbiased and do not allow for all possible
permutations of assignment. Consider this: in a circular list approach, it is
impossible for two players of the game to be assigned to each other as Santas.
That's not to say this is a minus of these algorithms, some groups may even
prefer this behavior, but if you're looking for something quite random you
probably need to look at little further.

However, Niklas Frykholm submitted what I believe to be a variation of the
family grouping theme that does produce unbiased matches. Here's the matching
portion of his code:

  class Array
    def random_member(&block)
      return select(&block).random_member if block
      return self[rand(size)]
    end
    def count(&block)
      return select(&block).size
    end
  end
  
  class String
    def clean_here_string
      indent = self[/^\s*/]
      return self.gsub(/^#{indent}/, "")
    end
  end
  
  class Person
    attr_reader :first, :family, :mail
    def initialize(first, family, mail)
      @first, @family, @mail = first, family, mail
    end
    def to_s() "#{first} #{family} <#{mail}>" end
  end
  
  class AssignSanta
    def initialize(persons)
      @persons = persons.dup
      @santas = persons.dup
      @families = persons.collect {|p| p.family}.uniq
      @families.each { |f|
        if santa_surplus(f) < 0
          raise "No santa configuration possible"
      }
    end
  
    # Key function -- extra santas available for a family
    # if this is negative -- no santa configuration is possible
    # if this is 0 -- next santa must be assigned to this family
    def santa_surplus(family)
      return @santas.count {|s| s.family != family} -
           @persons.count {|p| p.family == family}
    end
  
    def call
      while @persons.size() > 0
        family = @families.detect { |f|
          santa_surplus(f)==0 and
          @persons.count{|p| p.family == f} > 0
        }
        person = @persons.random_member { |p|
          family == nil || p.family == family
        }
        santa = @santas.random_member{ |s|
          s.family != person.family
        }
        yield(person, santa)
        @persons.delete(person)
        @santas.delete(santa)
      end
    end
  end

The exciting stuff is at the bottom, particularly the santa_surplus() method.
Niklas' code tracks how many Santas are still available to all the families at
each step and uses this information to avoid running out of options for future
selections. But don't take my word for it, he posted a wonderful follow-up
breakdown of exactly how it works to Ruby Talk:

  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/114760

The final type of solution was what is generally known as a "Hill Climbing"
algorithm. Dennis Ranke explains his version nicely:

  I start by assigning a random santa to everyone without caring about the
  constraints. Then I go through the list of people and for each one not
  having a correct santa I swap santas with someone else, so that both have
  correct santas afterwards.
  As far as I can see, this will only fail when no solution is possible and
  should be reasonably unbiased.

Put another way, you start with a random and likely quite incorrect match-up,
then correct it one swap at a time. Here's the code to match the description:

  class Person
     attr_reader :first, :last, :email
     attr_accessor :santa
  
     def initialize(line)
     m = /(\S+)\s+(\S+)\s+<(.*)>/.match(line)
     raise unless m
     @first = m[1].capitalize
     @last = m[2].capitalize
     @email = m[3]
     end
  
     def can_be_santa_of?(other)
     @last != other.last
     end
  end
  
  input = $stdin.read
  
  people = []
  input.each_line do |line|
     line.strip!
     people << Person.new(line) unless line.empty?
  end
  
  santas = people.dup
  people.each do |person|
     person.santa = santas.delete_at(rand(santas.size))
  end
  
  people.each do |person|
     unless person.santa.can_be_santa_of? person
     candidates = people.select { |p|
       p.santa.can_be_santa_of?(person) &&
       person.santa.can_be_santa_of?(p)
     }
     raise if candidates.empty?
     other = candidates[rand(candidates.size)]
     temp = person.santa
     person.santa = other.santa
     other.santa = temp
     finished = false
     end
  end
  
  people.each do |person|
     printf "%s %s -> %s %s\n", person.santa.first, person.santa.last,
                                person.first, person.last
  end

Santas are randomly assigned in the first people.each() iteration above and then
swapped until correct in the following people.each() iteration. I was surprised
at how simple this solution came out, with such effective results.

That's all I have time to cover, but that's certainly not all there is to see in
the submitted solutions. I learn new tricks every time I read the code to put
together one of these summaries. They're certainly worth a look, if you haven't
given them one already. Other highlights include Warren Brown's rules based
approach and Gavin Kistner's Ouroboros class for building circular lists, but
again, I think all the solutions are worth examining.

Before I wrap this up, let me congratulate Robo and Cristi BALAN who both
claimed inexperience and promptly proved it false with working code. I singled
Robo out in the discussion early on to hint at the quiz's complexity, but the
truth is that his solution could work fine for the problem at hand. Who cares
if you sometimes have to run it twice? I sure don't.

My thanks to all who submitted and all those that just tolerate our babble on
Ruby Talk. Look for Gavin's quiz tomorrow morning...