[SUMMARY] Secret Agent 00111 (#94)

Mission Report

Once again Secret Agent 00111 has saved the world from certain doom. As usual
the damage report was gratuitous: destroyed vehicles worth my salary for the
next five years, the usual string of sexual harassment and unwanted pregnancy
lawsuits we will need to address, and one missing flock of sheep according to a
local farmer. (Don't ask!) Full details to follow under separate cover.

00111's secret to success really was rather brilliant, though please don't let
on that we know that. The agent has more than enough ego as it is. Recognizing
that Q was not going to come through this time, 00111 took the prototype device
to a hacker operating under the alias Boris Prinz. (We're still trying to link
Boris to a real identity, but we suspect him of numerous famous hacker
incidents.) This turned out to be the key choice for 00111.

I'm sure you recall the brutal demise of 00100 when forced to rely on a Java
programmer's abilities while under severe time constraints. Desperate for
quicker results, 00111 gambled on a not-yet-mainstream community of Ruby
programmers. It turns out these Ruby guys are a hidden community of elite
hackers who seem to do this kind of this for fun. It's rather bizarre, but the
results are undeniable.

Naturally, Q's gizmo was utterly destroyed during 00111's daring escape through
the casino's canal drainage system. However, we're now monitoring all Boris
Prinz communications and have managed to intercept an email message where he is
bragging to the Ruby community about his success. The message contains a
prototype of the code 00111 loaded into the gizmo. I'm going to examine that
code in this next section so we have it on file.

  Ruby Code Analysis

We will look at this code slightly out of order, so that we might better
understand the thinking of Boris in his design. First, here is the heart of how
boris managed to keep the communications so short and to the point:

  module SecretAgent00111CommunicationGizmo
    class UndefinedRLE < Exception
    end
  
    def self.rle events
      raise UndefinedRLE.new if events.size == 0
      rl = []
      truevals = 0
      events.each do |result|
        if result
          truevals += 1
        else
          rl << truevals
          truevals = 0
        end
      end
      raise UndefinedRLE.new if truevals != 0
      rl
    end
  end

What you have here is a method that accepts a series of favorable or unfavorable
events, represented by Ruby's boolean values true and false. The method returns
a Run Length Encoding for these events, counting the number of truths that occur
in a row. In order for this to be possible for any size stream, a trailing
false must be appended after each each series of truths.

Here's a sample usage, so you can see what I mean here:

  >> SecretAgent00111CommunicationGizmo.rle(
  ?> [true] * 10 + # 10 favorable events
  ?> [false] + # the trailing false
  ?> [false] + # 0 favorable events and trailing false
  ?> [true] * 10 + # 10 more favorable events
  ?> [false] # the trailing false
  >> )
  => [10, 0, 10]

This method wasn't used in the gizmo's actual solution, though it helped us make
sense of the process. Boriz claimed it was part of some "Test Driven
Development (TDD)" process. Since none of our programmers are familiar with the
concept, we assume he made it up.

This leads us to the closely related unencode method:

  module SecretAgent00111CommunicationGizmo
    def self.unrle rl
      events = []
      rl.each {|n| events += [true]*n + [false]}
      events
    end
  end

Here we are examining the reverse operation, which unencodes the counts and adds
back the trailing false markers.

The gizmo's actual encoding process was driven by the following method:

  require 'stringio'
  
  module SecretAgent00111CommunicationGizmo
    def self.encode events, exponent
      io = StringIO.new
      e = Encoder.new(exponent, io)
      events.each do |result|
        e << result
      end
      e.finish
      io.string
    end
  end

Here again we have the events and you can see that we also receive an exponent,
which we will discuss in a bit. The events are fed to a custom Encoder object
which encodes them and writes them to the specified IO parameter. A StringIO
object is used as a stand-in IO here to collect the results in a Ruby String.

The Encoder is the source of most of the magic. Here is that code:

  module SecretAgent00111CommunicationGizmo
    class Encoder
      def initialize exponent, io
        @exponent = exponent
        @io = io
        @events = []
        @bits = ''
        @io << exponent.chr
      end
  
      def << result
        @events << result
        if result == false
          encode_events
        end
      end
  
      def encode_events
        n = @events.size - 1
        @events = []
        @bits << "1" * (n / 2**@exponent)
        @bits << "%0#{@exponent+1}B" % (n % 2**@exponent)
        flush_bytes
      end
  
      def flush_bytes
        # write every 8 bits to the stream:
        @bits.gsub!(/[01]{8}/) do |byte|
          @io << byte.to_i(2).chr
          ''
        end
      end
  
      def finish
        @events << false
        encode_events
        # fill up remaining byte with "1"s:
        rest = @bits.size % 8
        if rest != 0
          @bits << "1" * (8 - rest)
        end
        flush_bytes
      end
    end
  end

The Encoder begins by setting up variables to hold the exponent, IO stream,
events, and a bit encoding. You can see that it sends the exponent through the
stream right away. That's required by the encoding, which works out to be this:

  1. A byte representing the exponent used to encode events
  2. One or more encoded numbers, each consisting of the following:
      a. A group of one bits number / 2 ** exponent in length
      b. A zero bit
      c. Exponent bits representing number % 2 ** exponent
  3. Any one bits needed to pad the stream length to a multiple of eight

That list, really defines the rest of the methods we are looking at here.
First, <<() is used to add events and it always triggers the call to
encode_events() when a false is encountered. That method turns the event list
into the bits described by 2a, 2b, and 2c. encode_events() then triggers a call
to flush_bytes(), which sends those bits to the stream whenever we have at least
eight (so they can be converted into binary byte data). Finally, finish()
handles step 3 by adding the extra one bits needed.

Let's look at how that turns out when received at the other end. Again, we have
a simple method driving the process:

  require 'stringio'
  
  module SecretAgent00111CommunicationGizmo
    def self.decode data
      events = Decoder.new(StringIO.new(data)).read
      events.pop # remove last false
      events
    end
  end

This method wraps the stream in a custom Decoder object and reads the encoded
events. The last false is removed, because it's just the marker that came with
the final series of true events, and the event list is returned.

The Decoder is the final piece of the puzzle:

  module SecretAgent00111CommunicationGizmo
    class ExponentUndefined < Exception
    end
  
    class Decoder
      def bits(string)
        string.unpack("B*")[0]
      end
  
      def initialize io
        @io = io
        @exponent = nil
        @bits = ''
        @t = @n = nil
      end
  
      def exponent
        return @exponent if @exponent
        e = @io.getc
        if e
          @exponent = e
        else
          raise ExponentUndefined.new
        end
        @exponent
      end
  
      def add_events
        n = @n
        n += @t * 2**exponent if @t
        @t = @n = nil
        @events += SecretAgent00111CommunicationGizmo.unrle([n])
      end
  
      def decode_bits
        loop do
          found = false
          if @bits =~ /^(1+)0/
            @t = $1.size
            @bits.sub!(/^(1+)/, '')
            found = true
          end
          re = Regexp.new("^0([01]{#{exponent}})")
          if @bits =~ re
            @n = $1.to_i(2)
            @bits.sub!(re, '')
            add_events
            found = true
          end
          return unless found
        end
      end
  
      def read
        @events = []
        begin
          exponent
          data = @io.read
          @bits += bits(data)
          decode_bits
        rescue ExponentUndefined
          # exponent not available yet in stream, try again later
        end
        @events
      end
    end
  end

Start with initialize(), which is a setup very similar to Encoder. The @t and
@n variables are used to track the sequence of true events found and the encoded
number itself. Next skip to read() which is the primary work method. It begins
by using exponent() to find the magic number, if present. If the exponent isn't
yet available, read() returns zero events and the code can try the stream again
later. Next, bits() is a trivial wrapper to unpack all of the bytes (most
significant first) into a String of ones and zeros.

The real work is done in decode_bits(). This method loops over the bits looking
first for the run of one bits and then for the exponent length remainder. When
both of those are located, a call to add_events() rebuilds the original encoded
number which unrle() reverts to a series of events.

  Summary

As you can see, 00111 survived thanks the the cunning of the hacker Boris Prinz.
Had a less skilled coder been involved we might be inducting 01000 right now.
Please send a care package to Boris's team: Daniel Martin, Karl Czisch, Kero,
and Pierre Barbier de Reuille. We own them all our gratitude.

Don't let 00111 get too comfortable. We'll be sending him back out on
assignment tomorrow morning, this time into the villainous Lisp domains to
hijack their secrets...