[SUMMARY] Whiteout (#34)

Does this library have any practical value? Probably not. It's been suggested
in the Perl community that hacks like this are a good minor deterrent to those
trying to read source code you would rather keep hidden, but it must be stressed
that this is no form of serious security. Regardless, it's a fun little toy to
play with.

It was mentioned in the discussion that Perl, where ACME::Bleach comes from,
includes a framework for source filtering. It can be used to make modules that
modify source code much as we are doing in this quiz. Perl's Switch.pm is a
good example of this, but ironically ACME::Bleach is not.

That naturally leads to the question, can you build source filters in Ruby?
Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

  #!/usr/local/bin/ruby -w

  require "fix_my_broken_syntax"

  invalid++

Now the thought here is that fix_my_broken_syntax.rb will read my source, change
it so that it does something valid, eval() it, and exit() before the invalid
code is an issue. Here's a trivial example of fix_my_broken_syntax.rb:

  #!/usr/local/bin/ruby -w

  puts "Fixed!"
  exit

Does that work? Unfortunately, no:

  $ ruby invalid.rb
  invalid.rb:5: syntax error
  invalid++
           ^

Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

Except for whiteout.rb, our version of ACME::Bleach.

You can't build Ruby constructs out of whitespace alone, so some form of source
filtering is required. Luckily, we can get away with the approach described
above for this source filter, because a bunch of whitespace (with no code) is
valid Ruby syntax. It just doesn't do anything. Ruby will skip right over our
whitespace and load the library that restores and runs the code.

Most people took this approach. Let's examine one such example by Robin
Stocker:

  #!/usr/bin/ruby

···

#
  # This is my solution for Ruby Quiz #34, Whiteout.
  # Author:: Robin Stocker
  #

  #
  # The Whiteout module includes all functionality like:
  # - whiten
  # - run
  # - encode
  # - decode
  #
  module Whiteout

    @@bit_to_code = { '0' => " ", '1' => "\t" }
    @@code_to_bit = @@bit_to_code.invert
    @@chars_to_ignore = [ "\n", "\r" ]

    #
    # Whitens the content of a file specified by _filename_.
    # It leaves the shebang intact, if there is one.
    # At the beginning of the file it inserts the require 'whiteout'.
    # See #encode for details about how the whitening works.
    #
    def Whiteout.whiten( filename )
      code = ''
      File.open( filename, 'r' ) do |file|
        file.each_line do |line|
          if code.empty?
            # Add shebang if there is one.
            code << line if line =~ /#!\s*.+/
            code << "#{$/}require 'whiteout'#{$/}"
          else
            code << encode( line )
          end
        end
      end
      File.open( filename, 'w' ) do |file|
        file.write( code )
      end
    end
  
    # ...

First, we can see that the module defines some module variables, which are
really used as constants here. Their contents hint at the encoding algorithm
we'll see later.

Then we have a method for managing the transformation of the source into
whitespace. It starts by opening the passed file and reading the code
line-by-line. If the first line is a shebang line, it's saved in the variable
code. Next, a "require 'whiteout'" line is added to code. Finally, all other
lines from the file are appended to code after being passed through an encode()
method we'll examine shortly. With the contents read and transformed, the
method then reopens the source for writing and dumps the modifications into it.

The next method is the reverse process:

    # ...
  
    #
    # Reads the file _filename_, decodes and runs it through eval.
    #
    def Whiteout.run( filename )
      text = ''
      File.open( filename, 'r' ) do |file|
        decode = false
        file.each_line do |line|
          if not decode
            # We don't want to decode the "require 'whiteout'",
            # so start decoding not before we passed it.
            decode = true if line =~ /require 'whiteout'/
          else
            text << decode( line )
          end
        end
      end
      # Run the code!
      eval text
    end
  
    # ...

This method again reads the passed file. It skips over the "require 'whiteout'"
line, then copies the rest of the file into the variable text, after passing it
through decode() line-by-line. The final line of the method calls eval() on
text, which should now contain the restored program.

On to encode() and decode():

    #
    # Encodes text to "whitecode". It works like this:
    # - Chars in @@char_to_ignore are ignored
    # - Each byte is converted to its bit representation,
    # so that we have something like 01100001
    # - Then, it is converted to whitespace according to @@bit_to_code
    # - 0 results in a " " (space)
    # - 1 results in a "\t" (tab)
    #
    def Whiteout.encode( text )
      white = ''
      text.scan(/./m) do |char|
        if @@chars_to_ignore.include?( char )
          white << char
        else
          char.unpack('B8').first.scan(/./) do |bit|
            code = @@bit_to_code[bit]
            white << code
          end
        end
      end
      return white
    end

    #
    # Does the inverse of #encode, it takes "white"
    # and returns the decoded text.
    #
    def Whiteout.decode( white )
      text = ''
      char = ''
      white.scan(/./m) do |code|
        if @@chars_to_ignore.include?( code )
          text << code
        else
          char << @@code_to_bit[code]
          if char.length == 8
            text << [char].pack("B8")
            char = ''
          end
        end
      end
      return text
    end

  end
  
  # ...

The comments in there detail the exact process we're looking at here, so I'm not
going to repeat them.

Note that @@char_to_ignore contains "\n" and "\r" so they are not translated.
The effect of that is that line-endings are untouched by this conversion. Some
solutions used such characters in their encoding algorithm. The gotcha there is
that any line-ending translation done to the modified source (say FTP through
ASCII mode) will break the hidden code. Robin's solution doesn't have that
problem.

Here's the code that ties all those methods into a solution:

  # ...
  
  #
  # And here's the logic part of whiteout.
  # If it was run directly, whites out the files in ARGV.
  # And if it was required, decodes the whitecode and runs it.
  #
  if __FILE__ == $0
    ARGV.each do |filename|
      Whiteout.whiten( filename )
    end
  else
    Whiteout.run( $0 )
  end

Again, the comment saves me some explaining.

That was Robin's first solution to a Ruby Quiz, but I never would have known
that from looking at the code. Thanks for sharing Robin!

Obviously, a conversion of this type grossly inflates the size of the source.
Around eight times the size, to be exact. A couple of solutions used zlib to
control the expansion, which I thought was clever. By compressing the source
and then encoding() (and using a base three conversion) Dominik Bathom got
results around three times the inflation instead.

Ara.T.Howard took a different approach, using whiteout.rb as a database to store
the trimmed files. That was a very interesting process, demonstrated well in
the submission email. The advantages to this approach would be no inflation
penalty and the code stays readable (just not in the original location). The
disadvantage I see is that it requires the exact same library to be present both
at encoding and decoding, which probably makes sharing the altered code
impractical.

As always, my thanks to all who gave this little diversion an attempt. I'm sure
we'll see tons of whitespace only code on RubyForge in the future, thanks to our
efforts.

Tomorrow begins part one of our first two-part Ruby Quiz. Stay tuned...

Ruby Quiz wrote:

That naturally leads to the question, can you build source filters in Ruby? Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

  #!/usr/local/bin/ruby -w

  require "fix_my_broken_syntax"

  invalid++
[...]
Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

But note that if you do #!/usr/local/bin/ruby -w -r fix_my_broken_syntax you will be able to make it work.

a hack but:

   harp:~ > cat fix_my_broken_syntax.rb
   src = open($0).read
   src.gsub! %r/([_a-z][_a-zA-Z]*)\+\+/, '((\1+=1;\1 - 1))'
   eval src
   exit

   harp:~ > cat a.rb
   #!/usr/local/bin/ruby -r./fix_my_broken_syntax.rb
   n = 41
   p n++
   p n

   harp:~ > ./a.rb
   41
   42

cheers.

-a

···

On Thu, 9 Jun 2005, Ruby Quiz wrote:

Does this library have any practical value? Probably not. It's been suggested
in the Perl community that hacks like this are a good minor deterrent to those
trying to read source code you would rather keep hidden, but it must be stressed
that this is no form of serious security. Regardless, it's a fun little toy to
play with.

It was mentioned in the discussion that Perl, where ACME::Bleach comes from,
includes a framework for source filtering. It can be used to make modules that
modify source code much as we are doing in this quiz. Perl's Switch.pm is a
good example of this, but ironically ACME::Bleach is not.

That naturally leads to the question, can you build source filters in Ruby?
Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

  #!/usr/local/bin/ruby -w

  require "fix_my_broken_syntax"

  invalid++

Now the thought here is that fix_my_broken_syntax.rb will read my source, change
it so that it does something valid, eval() it, and exit() before the invalid
code is an issue. Here's a trivial example of fix_my_broken_syntax.rb:

  #!/usr/local/bin/ruby -w

  puts "Fixed!"
  exit

Does that work? Unfortunately, no:

  $ ruby invalid.rb
  invalid.rb:5: syntax error
  invalid++
           ^

Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

[Snip]
Obviously, a conversion of this type grossly inflates the size of the source.
Around eight times the size, to be exact. A couple of solutions used zlib to
control the expansion, which I thought was clever. By compressing the source
and then encoding() (and using a base three conversion) Dominik Bathom got
results around three times the inflation instead.

Using a base eight encoding plus zipping you can even reach a
deflation of source-length. See
http://ruby.brian-schroeder.de/quiz/whiteout/

regards and thanks for the summary,

Brian

···

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

What about using __END__ for this?

Klaus

···

Ruby Quiz <james@grayproductions.net> wrote:

       #!/usr/local/bin/ruby -w

       require "fix_my_broken_syntax"

       invalid++

[ Fix it ]

Does that work? Unfortunately, no:

       $ ruby invalid.rb
       invalid.rb:5: syntax error
       invalid++
                ^

Ruby never gets to loading the library, because it's not happy with the
syntax of the first file.

--

The Answer is 42. And I am the Answer. Now I am looking for the Question.

[snip]

a hack but:

   harp:~ > cat fix_my_broken_syntax.rb
   src = open($0).read
   src.gsub! %r/([_a-z][_a-zA-Z]*)\+\+/, '((\1+=1;\1 - 1))'
   eval src
   exit

   harp:~ > cat a.rb
   #!/usr/local/bin/ruby -r./fix_my_broken_syntax.rb
   n = 41
   p n++
   p n

   harp:~ > ./a.rb
   41
   42

cheers.

[snip]

I have some suggestions for alternate methods. I haven't actually
tried any of these yet, so take this with a grain of salt.

The more interesting one I think would be to use ParseTree, assuming
it allows (or eventually will) allow you to insert a modified
parsetree back into the interpreter. You could then traverse the tree
and look for items semantically instead of by regexps. There are
disadvantages to this of course. You couldn't add new operators and
such for instance, although I would imagine it would be good for
things like AOP (It also probably would be impossible to implement
whiteout using this method). A related option is to write a parser in
ruby for ruby that emits ParseTree sexps that can once again be
inserted into the interpreter. You could then modify this parser to
add whatever syntax constructs you like (new operators etc.) as long
as they could be mapped onto existing ruby syntax (since this is the
point of source filters usually, I see no problem with that
limitation, any more complicated and its just another language written
in ruby).

The other option to consider is a filter using pipes. Have two files,
one with the filterable source (ie written in latin or whitespace or
whatever) and another with the regexp based transformer, and wrap it
up in a script. eg:

$ cat illegible.rb
#@#@#@# -- ? : 2
dfsdasdasd
$ cat filter.rb
#!/usr/bin/env ruby
class LineNoise
         def transform
               ....
         end
end

x = LineNoise.new

IO.popen("ruby") do |rb|
     File.open("illegible.rb") do |ill|
          ill.each do |line|
             rb.print x.transform(line)
          end
     end
end
$

This gets rid of the eval nastiness but adds its own nastiness (like,
where do I find illegible.rb? etc.).

Just some ideas. Of course we could all write our own languages that
are just ruby with some syntax differences :wink:

···

On 6/9/05, Ara.T.Howard@noaa.gov <Ara.T.Howard@noaa.gov> wrote: