[SUMMARY] Testing DiGraph (#73)

The first step to solving this quiz is to come up with some graphs that you can
test. I started by thinking of some trivial patterns I could easily hand figure
the answers for. A good example is the graph:

  A -> B -> C -> D

Finding the longest simple path for that is not hard, it is always all of the
edges. That's just a straight line basically and it will include all the nodes.
Building these kinds of graphs programatically for any size is also easy:

  >> Array.new(3) { |i| [(?A + i).chr, (?A + i + 1).chr] }
  => [["A", "B"], ["B", "C"], ["C", "D"]]

Himadri ChoudHury took a different approach, generating completely random
graphs. This has the advantage of testing far more cases including isolated
nodes and circular paths. Here's the generation code:

  # @dg is the generated DiGraph. Store the paths in a hash called @paths
  @dg
  # @paths is a hash. Each element of the hash is an array that contains
  # all the paths from that node
  @paths
  # @nodes is an array which contains a list of all the nodes
  @nodes
  
  # Randomly generate @dg and the corresponding @paths
  def generate_dg
      nodes = Array.new
      # 10 nodes total
      10.times do
          nodes << rand(10)
      end
      nodes.uniq!
      @paths = Hash.new
      nodes.each do |n|
          num_paths_from_each_node = rand(3) + 1
          next_nodes = Array.new
          num_paths_from_each_node.times do
              next_nodes << nodes[rand(nodes.length)]
          end
          next_nodes.uniq!
          @paths[n] = next_nodes
      end
      arr = Array.new
      @paths.each do |key,vals|
          @paths[key].each do |val|
              arr << [key,val]
          end
      end
      @dg = DiGraph.new(*arr)
      @nodes = @paths.keys
  end

(Note: those bare instance variables nixed in with the comments reference
variables on the Class object and not the instances themselves. They read and
discard nil values having no effect.)

The code begins by generating up to ten (uniq!() can reduce the count) random
nodes as simple Integers. After that, each node is randomly connected to
between one and three other nodes. A Hash is created during this process to
represent the connections. This turns out to be very helpful later on.
Finally, the paths are converted into the Array representation DiGraph expects
and the graph is created. The last line also stores the nodes for easy access.

Here's a random graph (viewed from the paths Hash) created from this code, so
you can see how they come out:

  { 5 => [8, 6],
    0 => [0, 6],
    6 => [1, 7, 6],
    1 => [8],
    7 => [7, 9, 4],
    8 => [6, 1],
    9 => [4, 8, 0],
    4 => [5] }

Now, in order to use these random graphs, we really need code that can tell us
the right answer for the data. Essentially, this means that we require our own
implementation of the test methods, to prove that we get the same answers as the
DiGraph implementation. Here is one of those methods, used to test
max_length_of_simple_path_including_node():

  # Depth first search for the longest simple path starting from 'node'
  # Simple path means a path that doesn't contain any duplicate edges
  # Note: I'm not using the definition of simply connected based on no
  # duplicate nodes
  def search(node)
      longest_path = 0
      if (@paths[node])
          @paths[node].each_index do |next_idx|
              next_node = @paths[node][next_idx]
              @paths[node].delete_at(next_idx)
              tmp = 1 + search(next_node)
              @paths[node].insert(next_idx,next_node)
              if (longest_path < tmp)
                  longest_path = tmp
              end
          end
      end
      return longest_path
  end

This method is a simple depth-first search, as the comment says, counting the
steps taken to reach the farthest node without crossing a single edge twice.
The implementation is just a recursive search from each node reachable from the
start node. At each step, the last edge crossed is removed from the paths Hash,
the recursive search is performed, and then the edge is restored. The returned
result here is just a count, not the actual path.

The tests using that are pretty trivial:

  def test_03_max_length_of_simple_path_including_node
      generate_dg
      @nodes.each do |node|
          longest_path = search(node)
          # ...
          assert_equal( longest_path,
                        @dg.max_length_of_simple_path_including_node(node) )
      end
  end

Unfortunately, this is where we get to execution problems with the quiz. The
above code passes some tests, due to bugs in the DiGraph implementation, but it
does not correctly implement the max_length_of_simple_path_including_node()
method described in the quiz.

Here's an example of where it gets wrong answers, using my trivial one-way path
described at the beginning of this quiz:

  >> @paths = {"A" => ["B"], "B" => ["C"], "C" => ["D"]}
  => {"A"=>["B"], "B"=>["C"], "C"=>["D"]}
  >> search("A")
  => 3
  >> search("B")
  => 2
  >> search("C")
  => 1
  >> search("D")
  => 0

Those are the longest simple paths *starting* from a given node, but not the
longest given paths *including* the given node (which would all be 3).

These errors made it tricky to correctly evaluate the expected results of the
methods and I apologize for this. I ran into the same issue with my own tests,
as anyone following my posts to Ruby Talk saw in painful detail.

Luckily, it's easy to get to a real solution of the first method from here.
First, we can modify Himadri's search() method to return the path instead of
just a count:

  def search(node, longest_path = [node])
      if (@paths[node])
          @paths[node].each_index do |next_idx|
              next_node = @paths[node][next_idx]
              @paths[node].delete_at(next_idx)
              tmp = search(next_node, longest_path + [next_node])
              @paths[node].insert(next_idx,next_node)
              if (longest_path.size < tmp.size)
                  longest_path = tmp
              end
          end
      end
      return longest_path
  end

Then, using that method, we can exhaustively search all the long paths for the
biggest one containing our node:

  def max_length_of_simple_path_including_node(n)
    all_paths = @paths.keys.map { |node| search(node) }
    all_paths = all_paths.select { |path| path.include? n }
    all_paths = all_paths.sort_by { |path| -path.size }
    all_paths.empty? ? 0 : all_paths.shift.size - 1
  end

Line by line that makes a list of the longest paths starting from each node,
reduces that list to those including the desired node, sorts the remaining paths
biggest to smallest, and finally returns an edge count of the largest path or 0
if there aren't any paths left.

The other method is left as an exercise for the interested reader.

Many, many thanks to the submitters who braved the waters and came up with some
basic tests, right or wrong.

Tomorrow, we will programatically generate 1,000 monkeys in the hopes that they
can recreate the works of Shakespeare...