[ANN] Grammar 0.8

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

Here are some release notes:

* second complete rewrite since Grammar 0.5
* BNF-like grammar written directly in Ruby
* LL(1) parser
* separation of Grammar and parser generator ("engine")
* engines could generate a parser in a particular target language or do
something else with the Grammar tree
* current engines: Ruby (generate stand-alone very flat Ruby parser), Ruby0
(directly parse while traversing Grammar), Ruby2CExt (Ruby + convert to C
using ruby2cext)
* lexer and parser grammars are specified in exactly the same way - one
parses characters and the other tokens
* can write a lexer-free parser
* may define tokens however is appropriate
* methods for a lexer (or just token) Grammar to a parser Grammar -
including multi-threading
* arbitrary lookahead can be achieved with the backtrack method
* tail-call optimization of recursion (right recursion)
* directly handles left recursion (historically only doable with (LA)LR
parsers only).
* very very fast - on par with the best hand crafted/tuned recursive descent
LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
generators. And that is before Ruby2CExt.

Comments and feedback are quite welcome.

Eric

Here is a benchmark for parsing JSON (JSON->ruby). "size" is the size of
the parser specification (stripped out insignificant spacing). "bandwidth"
is the size of the JSON / runtime. hand* and json* are manually created
(well optimized) JSON parsers. The others are from general parser
generators. *ext and *Ext have underlying C code to help (Regexp could be
considered like that also). 3 engines are shown with Grammar 0.8 along with
Grammar 0.5 (5X pure ruby improvement from 0.5->0.8). The benchmark code
along with the JSON parser specs are in the gem (benchmark dir).

parser generator size bandwidth
---------------- ----- ---------
json/ext 12041 4.25M/s
Grammar/Ruby2CExt 1083 1.36M/s
hand/RE 1482 619K/s
hand/LL1 3342 483K/s
Grammar/Ruby 1083 417K/s
RACC/ext 1389 296K/s
json/pure 3979 283K/s
Grammar/0.5 1242 83.7K/s
RACC/ruby 1389 77.1K/s
Treetop 1121 8.98K/s
GhostWheel 1076 6.88K/s
Grammar/Ruby0 1083 2.59K/s
Peggy 2975 1.7K/s

Also, for any that looked at the presentation, the demos referred to are in
test/test_demo.rb.

···

On Sat, Sep 6, 2008 at 9:20 PM, Eric Mahurin <eric.mahurin@gmail.com> wrote:

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

Here are some release notes:

* second complete rewrite since Grammar 0.5
* BNF-like grammar written directly in Ruby
* LL(1) parser
* separation of Grammar and parser generator ("engine")
* engines could generate a parser in a particular target language or do
something else with the Grammar tree
* current engines: Ruby (generate stand-alone very flat Ruby parser), Ruby0
(directly parse while traversing Grammar), Ruby2CExt (Ruby + convert to C
using ruby2cext)
* lexer and parser grammars are specified in exactly the same way - one
parses characters and the other tokens
* can write a lexer-free parser
* may define tokens however is appropriate
* methods for a lexer (or just token) Grammar to a parser Grammar -
including multi-threading
* arbitrary lookahead can be achieved with the backtrack method
* tail-call optimization of recursion (right recursion)
* directly handles left recursion (historically only doable with (LA)LR
parsers only).
* very very fast - on par with the best hand crafted/tuned recursive
descent
LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
generators. And that is before Ruby2CExt.

Comments and feedback are quite welcome.

Eric

i look forward to using this - looks like a very nice piece of work!

a @ http://codeforpeople.com/

···

On Sep 6, 2008, at 8:20 PM, Eric Mahurin wrote:

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Thanks!

I believe I finally have infrastructure I'm happy with that is very
extensible. At this point, it would be nice to have help with development.
Here are some of the ideas I'm thinking:

* more engines (the things that traverse Grammar trees)
++ direct Ruby C (target: on par with Florian Frank's Ragel generated JSON C
extension)
++ engines call lambda's for actions instead of inlining (slower, but more
usable/flexible)
++ generation in other languages using native (non-ruby) objects: C++, Java,
python, ...
++ generation of Graphiz dot diagram or other visualization formats
++ LALR or packrat parser generation and/or add some features of those to
current engines (i.e. memoization)
* conversion mechanisms
++ from ANTLR
++ from YACC/RACC/RACC
++ possibly layer these so that these formats could be used directly
* specific language grammars
++ *ML formats - should be simple
++ parse Ruby itself - quite difficult, but should be possible with Grammar
++ other programming languages - ANTLR grammars are probably a good start
* better documentation and examples
++ especially a web page - not my forte
* additional functionality at the Grammar level
++ external object references (not just clones or Marshaled as now)
++ additional common Grammar patterns
* many other things I'm just not seeing

Contact me with any interest.

Eric

···

On Sun, Sep 7, 2008 at 11:22 PM, ara.t.howard <ara.t.howard@gmail.com>wrote:

On Sep 6, 2008, at 8:20 PM, Eric Mahurin wrote:

Since I was giving a presentation about Grammar at the Lonestar Ruby

Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

i look forward to using this - looks like a very nice piece of work!