[ANN] Ruby2CExtension 0.1.0

Ruby2CExtension is a Ruby to C extension translator/compiler. It takes any
Ruby source file, parses it using Ruby's builtin parser and then translates
the abstract syntax tree into "equivalent" C extension code.

More information and download at http://ruby2cext.rubyforge.org/.

Example

···

-------

Let's say you have a Ruby file foo.rb. To translate it to a C extension and
then compile it, just run:

   rb2cx foo.rb

This will produce the files foo.c and foo.so (on Linux). foo.c is the
generated C extension source code and foo.so is the compiled C extension.

Why?
----

Well, like everybody else I wanted a faster Ruby and I also wanted to learn
about Ruby's internals, so I thought translating Ruby to C might be worth a
try...

The results are not as good as I had hoped, but they aren't bad either: the
generated C extension is practically never slower than the Ruby code and I
found cases where it is more than twice as fast, usually it is somewhere in
between.

Of course Ruby2CExtension can also be used as an obfuscator for Ruby code,
though this was not my main motivation.

Features
--------

Ruby2CExtension supports a very large subset of Ruby's features (it can
translate itself into a C extension and the compiled version works correctly):

* all the basics (classes, methods, ...)
* blocks, closures
* instance_eval, define_method, ... (only when the block is given directly)
* correct constant and class variable lookup
* raise, rescue, retry, ensure
* ...

Of course there are some limitations, please see

http://ruby2cext.rubyforge.org/limitations.html

Requirements
------------

* Ruby 1.8.4
* RubyNode (http://rubynode.rubyforge.org/)

Duplication of effort? Do you know about ParseTree and Ruby2C?

···

On Jun 17, 2006, at 10:31 AM, Dominik Bathon wrote:

Ruby2CExtension is a Ruby to C extension translator/compiler. It takes any
Ruby source file, parses it using Ruby's builtin parser and then translates
the abstract syntax tree into "equivalent" C extension code.

More information and download at http://ruby2cext.rubyforge.org/\.

Example
-------

Let's say you have a Ruby file foo.rb. To translate it to a C extension and
then compile it, just run:

  rb2cx foo.rb

This will produce the files foo.c and foo.so (on Linux). foo.c is the
generated C extension source code and foo.so is the compiled C extension.

Why?
----

Well, like everybody else I wanted a faster Ruby and I also wanted to learn
about Ruby's internals, so I thought translating Ruby to C might be worth a
try...

The results are not as good as I had hoped, but they aren't bad either: the
generated C extension is practically never slower than the Ruby code and I
found cases where it is more than twice as fast, usually it is somewhere in
between.

Of course Ruby2CExtension can also be used as an obfuscator for Ruby code,
though this was not my main motivation.

Features
--------

Ruby2CExtension supports a very large subset of Ruby's features (it can
translate itself into a C extension and the compiled version works correctly):

* all the basics (classes, methods, ...)
* blocks, closures
* instance_eval, define_method, ... (only when the block is given directly)
* correct constant and class variable lookup
* raise, rescue, retry, ensure
* ...

Of course there are some limitations, please see

http://ruby2cext.rubyforge.org/limitations.html

Requirements
------------

* Ruby 1.8.4
* RubyNode (http://rubynode.rubyforge.org/\)

I was trying to install RubyNode and found that the following section
of your extconf.rb (in ext/ruby_node_ext) was causing problems:

        unless node_h == IO.read(File.join($hdrdir, "node.h"))
                warn File.join($hdrdir, "node.h")
                warn "is different from"
                warn File.join($rbsrcdir, "node.h")
                warn ""
                warn "Please set RUBY_SOURCE_DIR to the source path of
the current ruby!"
                exit 1
        end

According to the README if I'm running 1.8.4 (which I am) I don't need
to set RUBY_SOURCE_DIR. I then downloaded the Ruby source for 1.8.4
and set the RUBY_SOURCE_DIR env variable to point to that and still
got the warning and exit. So finally I just commented out the 'exit
1' and undefined RUBY_SOURCE_DIR and all was fine.

Phil

···

On 6/17/06, Dominik Bathon <dbatml@gmx.de> wrote:

Requirements
------------

* RubyNode (http://rubynode.rubyforge.org/\)

Requirements
------------

* Ruby 1.8.4
* RubyNode (http://rubynode.rubyforge.org/\)

Duplication of effort?

Maybe a bit. But as I said, it started as a toy project to learn about Ruby's internals.

Do you know about ParseTree and Ruby2C?

Yes.

But RubyNode was just a by-product of Ruby2CExtension and it has a different interface than ParseTree.

And Ruby2CExtension is not like Ruby2C, it's more like ZenObfuscate.

Dominik

···

On Sat, 17 Jun 2006 19:10:17 +0200, Logan Capaldo <logancapaldo@gmail.com> wrote:

On Jun 17, 2006, at 10:31 AM, Dominik Bathon wrote:

This is a sanity check and it shouldn't fail.

Could you compare the 2 files using diff or something similar? Otherwise you could send me the first file (File.join($hdrdir, "node.h")) by mail (offlist).

What platform are you on? How / from where did you install Ruby?

Dominik

···

On Wed, 21 Jun 2006 08:00:03 +0200, Phil Tomson <rubyfan@gmail.com> wrote:

On 6/17/06, Dominik Bathon <dbatml@gmx.de> wrote:

Requirements
------------

* RubyNode (http://rubynode.rubyforge.org/\)

I was trying to install RubyNode and found that the following section
of your extconf.rb (in ext/ruby_node_ext) was causing problems:

        unless node_h == IO.read(File.join($hdrdir, "node.h"))
                warn File.join($hdrdir, "node.h")
                warn "is different from"
                warn File.join($rbsrcdir, "node.h")
                warn ""
                warn "Please set RUBY_SOURCE_DIR to the source path of
the current ruby!"
                exit 1
        end

MMm. I'll be honest, I was kind of disappointed. I was hoping for a response like "Yes, but ParseTree has <X> limitation or uses <Y> style, and RubyNode has a <Z> implementation" :). Ah well.

···

On Jun 17, 2006, at 2:58 PM, Dominik Bathon wrote:

On Sat, 17 Jun 2006 19:10:17 +0200, Logan Capaldo > <logancapaldo@gmail.com> wrote:

On Jun 17, 2006, at 10:31 AM, Dominik Bathon wrote:

Requirements
------------

* Ruby 1.8.4
* RubyNode (http://rubynode.rubyforge.org/\)

Duplication of effort?

Maybe a bit. But as I said, it started as a toy project to learn about Ruby's internals.

Do you know about ParseTree and Ruby2C?

Yes.

But RubyNode was just a by-product of Ruby2CExtension and it has a different interface than ParseTree.

And Ruby2CExtension is not like Ruby2C, it's more like ZenObfuscate.

Dominik

So were we.

*shrug*

···

On Jun 19, 2006, at 10:42 PM, Logan Capaldo wrote:

On Jun 17, 2006, at 2:58 PM, Dominik Bathon wrote:

But RubyNode was just a by-product of Ruby2CExtension and it has a different interface than ParseTree.

And Ruby2CExtension is not like Ruby2C, it's more like ZenObfuscate.

Dominik

MMm. I'll be honest, I was kind of disappointed. I was hoping for a response like "Yes, but ParseTree has <X> limitation or uses <Y> style, and RubyNode has a <Z> implementation" :). Ah well.

Requirements
------------

* Ruby 1.8.4
* RubyNode (http://rubynode.rubyforge.org/\)

Duplication of effort?

Maybe a bit. But as I said, it started as a toy project to learn about Ruby's internals.

Do you know about ParseTree and Ruby2C?

Yes.

But RubyNode was just a by-product of Ruby2CExtension and it has a different interface than ParseTree.

And Ruby2CExtension is not like Ruby2C, it's more like ZenObfuscate.

Dominik

MMm. I'll be honest, I was kind of disappointed. I was hoping for a response like "Yes, but ParseTree has <X> limitation or uses <Y> style, and RubyNode has a <Z> implementation" :). Ah well.

Then you should have asked for a comparison ;-). Anyway:

When I started working on Ruby2CExtension (around February 2006), I intended to work with Ruby 1.9 and ParseTree didn't support 1.9 back then. So I looked for alternatives.

I first found Ripper, which works in 1.9, but doesn't return the exact node tree. For example "1+1" produces something like [:program, [[:binary, int(1), :+, int(1)]]], while with RubyNode you get [:call, {:mid=>:+, :recv=>[:lit, {:lit=>1}], :args=>[:array, [[:lit, {:lit=>1}]]]}]. And Ripper also is work in progress and has some bigger problems for example with here documents.

Then I found Nodewrap, which worked really nice. But as I progressed with Ruby2CExtension I found that Nodewrap had some problems with some node types, I sent some patches to Paul Brannan and he was actually working on a new release, but didn't have enough time.

So I finally wrote my own node tree accessing library. It had the following design goals:

-as simple as possible, easily maintainable
-read only (because I saw that Nodewrap had to jump through lots of hoops to allow write access)
-compatible to different Ruby versions
-low level and high level access
-get as much information about the node types as possible by parsing Ruby source code

Because of the last point RubyNode is not easily gemifyable, but on the other hand it should easily adapt to changes in Ruby even without changing the RubyNode source code (at least it should never segfault).

Some things that RubyNode can do and ParseTree currently cannot:

-Low level access:
ParseTree only gives you the s-exps, with RubyNode you can get the flags field, the line number and the filename of the node. You can also get each the raw long value of each union if you really want, and so on.

-Access node trees of procs

-Parse arbitrary strings of Ruby code to node trees without evaling them:
ParseTree only allows evaling code and then only provides access to method node trees, with RubyNode you can just do:

irb(main):001:0> pp "p 1;class A; def foo;42;end;end;p 2".parse_to_nodes.transform
[:block,
  [[:fcall, {:mid=>:p, :args=>[:array, [[:lit, {:lit=>1}]]]}],
   [:class,
    {:body=>
      [:scope,
       {:next=>
         [:defn,
          {:mid=>:foo,
           :defn=>
            [:scope,
             {:next=>
               [:block,
                [[:args, {:rest=>-1, :opt=>false, :cnt=>0}],
                 [:lit, {:lit=>42}]]],
              :rval=>false,
              :tbl=>nil}],
           :noex=>2}],
        :rval=>false,
        :tbl=>nil}],
     :super=>false,
     :cpath=>[:colon2, {:mid=>:A, :head=>false}]}],
   [:fcall, {:mid=>:p, :args=>[:array, [[:lit, {:lit=>2}]]]}]]]

This feature is actually quite simple and I think it should be added to ParseTree.

As you can see, the pretty printed node tree above is a bit verbose, but the hashes are IMO much more flexible and nicer than accessing the attributes by position.

RubyNode doesn't have an equivalent to SexpProcessor, but it is easy enough to make your own, as I did for Ruby2CExtension. Example:

class NodeProcessor
   def process(node)
     case node
     when false
       "Qnil"
     else
       begin
         send("process_#{node.first}", node.last)
       rescue
         # handle
       end
     end
   end

   def process_class(hash)
     # ...
   end

   # ...

end

I hope this answers all your questions.

And while I am at it: Ruby2CExtension vs. ZenObfuscate:

I can't really compare them because I don't have access to ZenObfuscate, but from the announcement:

     - Known Limitations
           There are issues with what the obfuscator can translate to C
           and as a result you may need to modify your code in order to
           translate it. Usually this is a pretty straightforward and
           simple task. We do a good job of translating static ruby to
           its equivalent C, but not all ruby has an equivalent in C.
         - Only translates methods in classes and modules, not
           freestanding code.

Ruby2CExtension translates free standing code.

         - Explicit returns are required in all methods.

Ruby2CExtension doesn't require those.

         - Temporary: Conditional logic (including ?:slight_smile: may not be on the
           right hand side of an assignment.

No problem in Ruby2CExtension

         - Temporaryish: Exception handling and generic block closures
           currently don't translate.

They do in Ruby2CExtension (except for some things described at http://ruby2cext.rubyforge.org/limitations.html#section10\)

         - Some expressions in ruby we don't currently do, but could
           upon request, where some other ruby expressions will never
           translate.

I am not sure what Ryan means here, but Ruby2CExtension can translate arbitrary Ruby expressions (again except for things described at http://ruby2cext.rubyforge.org/limitations.html\)

And Ruby2CExtension is free.

Dominik

···

On Tue, 20 Jun 2006 07:42:31 +0200, Logan Capaldo <logancapaldo@gmail.com> wrote:

On Jun 17, 2006, at 2:58 PM, Dominik Bathon wrote:

On Sat, 17 Jun 2006 19:10:17 +0200, Logan Capaldo >> <logancapaldo@gmail.com> wrote:

On Jun 17, 2006, at 10:31 AM, Dominik Bathon wrote:

Ryan Davis schrieb:

But RubyNode was just a by-product of Ruby2CExtension and it has a different interface than ParseTree.

And Ruby2CExtension is not like Ruby2C, it's more like ZenObfuscate.

Dominik

MMm. I'll be honest, I was kind of disappointed. I was hoping for a response like "Yes, but ParseTree has <X> limitation or uses <Y> style, and RubyNode has a <Z> implementation" :). Ah well.

So were we.

*shrug*

If anyone is interested in working with Ruby's AST then it shouldn't be too hard to look at the source code of both ParseTree and RubyNode. They both depend on some C code and both seem to support Ruby 1.8.4 and 1.9. What other features are you interested in?

I'll be honest, too. I can understand that Ryan and Eric are disappointed, but I think you shouldn't blame Dominik for his answer.

Regards,
Pit

···

On Jun 19, 2006, at 10:42 PM, Logan Capaldo wrote:

On Jun 17, 2006, at 2:58 PM, Dominik Bathon wrote:

Some things that RubyNode can do and ParseTree currently cannot:

-Low level access:
ParseTree only gives you the s-exps, with RubyNode you can get the flags field, the line number and the filename of the node. You can also get each the raw long value of each union if you really want, and so on.

ParseTree supports line numbers and filenames of lines of the original ruby code.

We lack getting the flags or the original value of the NODE, but we've never needed these things.

-Access node trees of procs

ParseTree can do this too, but with a little bit of hacking. I forgot where we put it, but I believe this code is in ZenHacks.

I believe we could also do this with some C code, but that's less fun than our hack.

-Parse arbitrary strings of Ruby code to node trees without evaling them:
ParseTree only allows evaling code and then only provides access to method node trees, with RubyNode you can just do:

[...]

This feature is actually quite simple and I think it should be added to ParseTree.

We haven't seriously looked into making this work, we had more pressing issues.

···

On Jun 20, 2006, at 1:53 PM, Dominik Bathon wrote:

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Pit Capitain ha scritto:

Ryan Davis schrieb:

But RubyNode was just a by-product of Ruby2CExtension and it has a different interface than ParseTree.

And Ruby2CExtension is not like Ruby2C, it's more like ZenObfuscate.

Dominik

MMm. I'll be honest, I was kind of disappointed. I was hoping for a response like "Yes, but ParseTree has <X> limitation or uses <Y> style, and RubyNode has a <Z> implementation" :). Ah well.

So were we.

*shrug*

If anyone is interested in working with Ruby's AST then it shouldn't be too hard to look at the source code of both ParseTree and RubyNode. They both depend on some C code and both seem to support Ruby 1.8.4 and 1.9. What other features are you interested in?

don't we even have ripper, which is in the standard library?

···

On Jun 19, 2006, at 10:42 PM, Logan Capaldo wrote:

On Jun 17, 2006, at 2:58 PM, Dominik Bathon wrote:

--

blog en: http://www.riffraff.info
blog it: http://riffraff.blogsome.com
jabber : rff.rff at gmail dot com

I wasn't "blaming" him, I just like hearing people smarter than me talk about why they chose to do something. I was getting already to learn and stuff ;).

···

On Jun 20, 2006, at 4:37 AM, Pit Capitain wrote:

If anyone is interested in working with Ruby's AST then it shouldn't be too hard to look at the source code of both ParseTree and RubyNode. They both depend on some C code and both seem to support Ruby 1.8.4 and 1.9. What other features are you interested in?

I'll be honest, too. I can understand that Ryan and Eric are disappointed, but I think you shouldn't blame Dominik for his answer.

Some things that RubyNode can do and ParseTree currently cannot:

-Low level access:
ParseTree only gives you the s-exps, with RubyNode you can get the flags field, the line number and the filename of the node. You can also get each the raw long value of each union if you really want, and so on.

ParseTree supports line numbers and filenames of lines of the original ruby code.

Okay, I missed that ParseTree supports that through NODE_NEWLINE, but NODE_NEWLINE is no longer available in 1.9.

···

On Tue, 20 Jun 2006 23:37:25 +0200, Eric Hodel <drbrain@segment7.net> wrote:

On Jun 20, 2006, at 1:53 PM, Dominik Bathon wrote:

We lack getting the flags or the original value of the NODE, but we've never needed these things.

-Access node trees of procs

ParseTree can do this too, but with a little bit of hacking. I forgot where we put it, but I believe this code is in ZenHacks.

I believe we could also do this with some C code, but that's less fun than our hack.

-Parse arbitrary strings of Ruby code to node trees without evaling them:
ParseTree only allows evaling code and then only provides access to method node trees, with RubyNode you can just do:

[...]

This feature is actually quite simple and I think it should be added to ParseTree.

We haven't seriously looked into making this work, we had more pressing issues.