Which library to write a parser

thomas_carlier · 16 January 2012 07:48

Hi,

I need to write an easy parser and i don't know if i can use an existing
library.

Thanks

···

--
Posted via http://www.ruby-forum.com/.

Peter_Zotov · 16 January 2012 08:07

thomas carlier писал 16.01.2012 11:48:

Hi,

I need to write an easy parser and i don't know if i can use an existing
library.

Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

···

--
WBR, Peter Zotov.

Benedikt_Huber · 31 January 2012 12:05

Hello again,
I found one combinator-based parser library which seems to provide quite
decent performance:

rsec-ext (http://rsec.heroku.com/)

Here is the code implementing parslet's MiniP parser:

  require 'rubygems'
  require "rsec"
  include Rsec::Helpers

  id = /[a-z]+/.r.fail 'id'
  int = /[0-9]+/.r.fail 'int'
  op = one_of_('+')
  comma = /\s*,\s*/.r
  sum = seq(int, op, lazy{expr})
  arglist= '('.r >> lazy{expr}.join(comma).even << ')'
  funcall= seq_(id, arglist)
  expr = funcall | sum | int
  parser = expr.eof

File.readlines(ARGV.first).each { |line| parser.parse!(line) }

I do not claim that the parsers are equivalent (different datastructures
for the result), and so the comparison is a little bit unfair and only
shows a trend. I'd like to share it anyway:
Parsing 10000 lines, each containing
puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 23532 + 11, 2, 3, 5, 7, 11,
19)
the parsers need:

  parslet / ruby 1.8: 162.8s
  parslet / ruby 1.9: 49.5s
  rsec-ext / ruby 1.9: 0.7s

Kind Regards,
Benedikt

···

--
Posted via http://www.ruby-forum.com/.

GD1 · 31 January 2012 23:47

Hi Thomas,

I use ANTLR to generate C++ code which I subsequently extend into Ruby. It'd be absolute overkill if you're not already extending/embedding for your project though!

I believe ANTLR is LL(*). It also does lexers.

There is apparently a Ruby target for ANTLR as well. I've never tried it myself.

You'll need Java to build, but not to run- that's the case in my current project but I'm not using the latest ANTLR.

Some links:

http://www.antlr.org/

http://www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget

Another thing to explore, maybe.

Good luck!

Garth

PS. If someone has already suggested ANTLR, my apologies. I skimmed the replied but it was hardly a detailed search.

···

On 16/01/12 18:18, thomas carlier wrote:

Hi,

I need to write an easy parser and i don't know if i can use an existing
library.

Thanks

Kaspar_Schiess · 16 January 2012 08:33

Try Treetop:
http://treetop.rubyforge.org/

And then try parslet:
http://kschiess.github.com/parslet/

k

thomas_carlier · 16 January 2012 09:40

Peter Zotov wrote in post #1041059:

thomas carlier писал 16.01.2012 11:48:

Hi,

I need to write an easy parser and i don't know if i can use an
existing
library.

Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

Thanks and nice blog, very useful informations.
I'll try treetop

···

--
Posted via http://www.ruby-forum.com/\.

11142 · 31 January 2012 17:30

For heavy lifting, there's always Racc. ruby_parser uses it, and it's
pretty fast.

http://i.loveruby.net/en/projects/racc/
http://rubygems.org/gems/racc
http://rubydoc.info/gems/racc/1.4.7/frames

(Didn't do benchmarks.)

-- Matma Rex

Tony_Arcieri3 · 16 January 2012 08:47

Gotta give a nod to kpeg:

···

On Mon, Jan 16, 2012 at 12:33 AM, Kaspar Schiess <eule@space.ch> wrote:

Try Treetop:

http://treetop.rubyforge.org/

And then try parslet:
http://kschiess.github.com/**parslet/<http://kschiess.github.com/parslet/>

k

--
Tony Arcieri

Ryan_Davis1 · 31 January 2012 22:20

I did, but with all the complexity of ruby_parser, not the grammar in this thread. I did both the 10k testcase above as well as 10k lines of puts(2 + 3) on both ruby 1.8 and 1.9:

1.8:

116.05s: 86.17 l/s: 6.23 Kb/s: 722 Kb:10000 loc:../dev/blah1_10k.rb
8.68s: 1152.15 l/s: 13.50 Kb/s: 117 Kb:10000 loc:../dev/blah2_10k.rb

1.9:

84.80s: 117.92 l/s: 8.52 Kb/s: 722 Kb:10000 loc:../dev/blah1_10k.rb
5.48s: 1825.23 l/s: 21.39 Kb/s: 117 Kb:10000 loc:../dev/blah2_10k.rb

Not an entirely fair comparison by using ruby_parser instead of an incredibly restrained grammar... but there you have it.

That said, I will say that I only barely tolerate LR based parser generators. I would love to have a fully conformant LL-based parser for ruby. I'm not convinced it is possible as ruby's grammar is seriously fucked up.

···

On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

For heavy lifting, there's always Racc. ruby_parser uses it, and it's
pretty fast.

http://i.loveruby.net/en/projects/racc/
http://rubygems.org/gems/racc
File: README — Documentation for racc (1.4.7)

Peter_Zotov · 16 January 2012 10:31

Tony Arcieri писал 16.01.2012 12:47:

Gotta give a nod to kpeg:

GitHub - evanphx/kpeg: A simple PEG library for ruby

Try Treetop:

http://treetop.rubyforge.org/

And then try parslet:

http://kschiess.github.com/**parslet/<http://kschiess.github.com/parslet/>

k

Is there a PEG parser around here which does not keep all the symbols each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

···

On Mon, Jan 16, 2012 at 12:33 AM, Kaspar Schiess <eule@space.ch> > wrote:

--
WBR, Peter Zotov.

Benedikt_Huber · 1 February 2012 00:57

Ryan Davis wrote in post #1043324:

For heavy lifting, there's always Racc. ruby_parser uses it, and it's
pretty fast.

Racc
racc | RubyGems.org | your community gem host
http://rubydoc.info/gems/racc/1.4.7/frames

I did, but with all the complexity of ruby_parser, not the grammar in
this thread.

Thanks for the numbers. I wrote a small grammar for the MiniP language
in racc, and repeated the experiments (on a different machine, ruby
1.9.3). racc's speed is ok, but it seems to be slower than rsec-ext.

parslet
53.10s
racc
2.05s
rsec-ext
0.72s

That said, I will say that I only barely tolerate LR based parser
generators. I would love to have a fully conformant LL-based parser for
ruby.

I believe both racc and ANTLR won't be faster than rsec-ext, as they
generate ruby code. For many less-convoluted grammars (ruby is not a
good example ;)), a PEG-style parser library is a good and pleasant to
use alternative to a parser generator.

For my prototype, I rewrote a constraint file parser to rsec-ext,
which works great. I can't use it at the moment, because it is 1.9
only, but that's a different story.

Kind Regards,
Benedikt

···

On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

--
Posted via http://www.ruby-forum.com/\.

Kaspar_Schiess · 16 January 2012 15:58

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

As one of the authors of one of these libraries which are slow as a dying snail, I am wondering: How do you measure? Would you like to contribute your benchmark, so that our (quite extensive) optimization efforts can go your way? And finally, what is the ground speed of a dying snail?

My measurements (to contribute to the thread as well) are here: press play on tape – Parslet and its friends

k

thomas_carlier · 18 January 2012 03:18

Kaspar Schiess wrote in post #1041114:

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

As one of the authors of one of these libraries which are slow as a
dying snail, I am wondering: How do you measure? Would you like to
contribute your benchmark, so that our (quite extensive) optimization
efforts can go your way? And finally, what is the ground speed of a
dying snail?

My measurements (to contribute to the thread as well) are here:
press play on tape – Parslet and its friends

k

Hi,

Which one will you recommend for a simple grammar like

expressions : expression+
;
expression : [{]content+[}]
;
content : token
> TOKEN[|]content
>TOKEN[|]expression
;
TOKEN : .*
;

The grammar is not correct, but you'll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars

···

--
Posted via http://www.ruby-forum.com/\.

Ryan_Davis1 · 18 January 2012 04:07

Write your own by hand. It'll be faster and smaller than anything mentioned above.

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

···

On Jan 17, 2012, at 19:18 , thomas carlier wrote:

Which one will you recommend for a simple grammar like

expressions : expression+
;
expression : [{]content+[}]
;
content : token
> TOKEN[|]content
>TOKEN[|]expression
;
TOKEN : .*
;

The grammar is not correct, but you'll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars

Kaspar_Schiess · 25 January 2012 08:20

And for another definition of faster (faster to code): parslet.

greetings,
k

···

On 18.01.12 05:07, Ryan Davis wrote:

Write your own by hand. It'll be faster

Benedikt_Huber · 30 January 2012 22:58

Kaspar Schiess wrote in post #1042425:

Write your own by hand. It'll be faster

And for another definition of faster (faster to code): parslet.

greetings,
k

Hello,

I tried out parslet today, and I really appreciate its design (i.e.,
what parsers look like) as well as the beautiful webpage you built.

I could not use it for my current project, however, as it was way to
slow. I need to be able to parse millions of constraint specifications,
each based on a fairly simple grammar (20 rules or so). I found my
original implementation (regular expression matching and scanning) to
be ugly, inefficient (scanning a string a few times) and rather
slow (~15K constraints per second). The goal was to get both more
beautiful and faster by using a parser framework. It got more beautiful,
but too slow to be useful to me.

If I had benchmarked the MiniP parser posted on the homepage of parslet,
it would have been obvious that parslet is too slow for my purposes: On
my machine, it parses 180 lines / second, given the test string
"puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 11, 2, 3, 5, 7, 11,19)"
So while I appreciate the neat interface of those fancy new parsers,
people should know that they are slow.

But, I think it would be awesome to have a beautiful parser lib that is
fast, and given that the performance of the regular expression engine is
quite decent, it should be feasible to build such a parser.
a) Do you think there is any chance to get a faster (say, factor 100)
implementation for parslet with the same (or a similar) interface?
b) If not, is there any maintained ruby parsing library or parser
generator (no need to be in pure ruby) which is fast enough? How fast is
antlr for ruby?

Kind Regards,
Benedikt

···

On 18.01.12 05:07, Ryan Davis wrote:

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
[ann] rsec - parsec for ruby1.9 ruby-talk	1	110	16 February 2010
Using ruby for generic language parsing (or any language-specific parsing libraries out there?) ruby-talk	3	182	13 April 2009
Ruby language parser in ruby ruby-talk	26	174	4 December 2009
[ANN] parslet 1.4.0 ruby-talk	0	139	29 May 2012
Any parser for regular expressions? ruby-talk	4	121	16 February 2012

Which library to write a parser

Related topics