[QUIZ] Bytecode Compiler (#100)

This is my first post to ruby-talk and my first stab at a ruby quiz! Hooray!
I used rparsec because I was too lazy to write a simple parser. And I used some cute monkey patching that I shouldn't have.
But it works!

require 'rparsec/rparsec'

class Fixnum def to_bytes
       if self >= -32768 && self <= 32767
           a = [0x01]
           a << ((self & 0x0000FF00) >> 8) a << ((self & 0x000000FF)) else
           a = [0x02] a << ((self & 0xFF000000) >> 24) a << ((self & 0x00FF0000) >> 16) a << ((self & 0x0000FF00) >> 8) a << ((self & 0x000000FF)) end
   end end class Symbol
   def to_bytes
       case self
       when :+
           [0x0a]
       when :-
           [0x0b]
       when :*
           [0x0c]
       when :**
           [0x0d]
       when :confused:
           [0x0e]
       when :%
           [0x0f]
       end
   end

   def to_proc
       proc { |x| x.send self }
   end
end

class Array
   alias to_bytes to_a
end

class Compiler
   include Parsers
   include Functors
   def self.compile str
       new.parser.parse str
   end

   def func sym
       proc { |x, y| [x,y,sym].map(&:to_bytes).flatten }
   end

   def parser
       ops = OperatorTable.new do |t|
           t.infixl(char(?+) >> func(:+), 20)
           t.infixl(char(?-) >> func(:-), 20)
           t.infixl(char(?*) >> func(:*), 40)
           t.infixl(char(?/) >> func(:/), 40)
           t.infixl(char(?%) >> func(:%), 40)
           t.infixl(string('**') >> func(:**), 60)
           t.prefix(char(?-) >> Neg, 80)
       end
       expr = nil
       term = integer.map(&To_i) | char('(') >> lazy{expr} << char(')')
       delim = whitespace.many_
       expr = delim >> Expressions.build(term, ops, delim)
   end
end

Aaaargh formatting explosion! Here's that first bit all nice and unexploded.

class Fixnum def to_bytes
        if self >= -32768 && self <= 32767
            a = [0x01]
            a << ((self & 0x0000FF00) >> 8) a << ((self & 0x000000FF)) else
            a = [0x02] a << ((self & 0xFF000000) >> 24) a << ((self & 0x00FF0000) >> 16) a << ((self & 0x0000FF00) >> 8) a << ((self & 0x000000FF)) end
    end end

class Symbol
    def to_bytes
        case self
        when :+
            [0x0a]
        when :-
            [0x0b]
        when :*
            [0x0c]
        when :**
            [0x0d]
        when :confused:
            [0x0e]
        when :%
            [0x0f]
        end
    end

    def to_proc
        proc { |x| x.send self }
    end
end

Here is my solution. I tried to keep it short and simple.
Unfortunately, it got a little more complicated when I added code to
pass Marcel's test cases. But now it can handle '--++-++-2+-2'....

module Compiler

  # The whole thing wrapped up into one function....

···

#
  # The first half tokenizes into single chars or fixnums: it uses gsub
  # to convert '**' to '^', binary '-'' to '~', and to remove pointless '+'s.
  # Digits and most remaining '-'s are collected and converted using to_i.
  # A few pesky unary '-'s are converted to the token 'n'
  #
  # The second half uses the shunting yard algorithm to build the RPN,
  # and does the token->bytecode translation inline.
  #
  # The Tokens array contains 4 items: the character token, the bytecode,
  # and two precedence values - one for a token on the stack,
  # and another for a token in the stream. The values are only different
  # if the operator is right-associative - this simplifies the conditional
  # for deciding when to pop operators.
  # The 'n' token is expanded to [0,swap,sub]

Tokens = [['+',10,3,3],['~',11,3,3],['*',12,2,2],['/',14,2,2],['%',15,2,2],
          ['^',13,1,0],['n',[1,0,0,0xa0,11],-1,-2],[nil,'(',9,9]]

def self.compile expr
  num,tokens,output,stack='',[],[],[]

  expr.gsub(/(\d|\))-/,'\1~').gsub(/(^|[^\d)])\++/,'\1').gsub("**","^").each_byte{|b|
    if /\d|-/ =~ c=b.chr
      num+=c
      num.gsub!('--','')
    else
      tokens << (/\d/=~num ? num.to_i : 'n') and num='' unless num.empty?
      tokens << c
    end
  }
  tokens << num.to_i unless num.empty?

  tokens.each{|token|
    case token
      when Integer
        output += (-2**15...2**15)===token ?
          [1]+[token].pack('n').unpack('C*') :
          [2]+[token].pack('N').unpack('C*')
      when '('
        stack << token
     when ')'
        output << stack.pop until stack.last == '('
        stack.pop
      else
        tokdef = Tokens.assoc(token)
        output << stack.pop while t=stack.last and Tokens.rassoc(t)[2]
<= tokdef[3]
        stack << tokdef[1]
    end
  }
  (output+stack.reverse).flatten
end

--Adam