Ruby language parser in ruby

Brian_Candler · 29 November 2009 11:23

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

I've found RubyParser. Are there any other options I should be looking
at?

* I see ruby 1.9 has ripper, but it's written as a C extension. I want
something in pure ruby.

* ParseTree is also in C I believe.

* Possibly could look at the internals of rubinius?

* Anything else?

Thanks,

Brian.

···

--
Posted via http://www.ruby-forum.com/.

Ryan_Davis1 · 29 November 2009 12:17

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

I've found RubyParser. Are there any other options I should be looking
at?

* I see ruby 1.9 has ripper, but it's written as a C extension. I want
something in pure ruby.

well to be fair. ripper is yacc + C. ruby_parser is racc + ruby. Until someone writes a recursive descent parser, you'll never have PURE ruby (and it wouldn't be quite as maintainable). I'm actually working on that, but I don't yet know if I'll succeed.

* ParseTree is also in C I believe.

and doesn't parse. It just uses ruby to parse and grabs the internal ast.

* Possibly could look at the internals of rubinius?

Was using ruby_parser, but now is also C... you know... ruby in ruby.

I think your best bet at this time is ruby_parser.

···

On Nov 29, 2009, at 03:23 , Brian Candler wrote:

Marnen_Laibow-Koser · 29 November 2009 15:53

Brian Candler wrote:

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

I've found RubyParser. Are there any other options I should be looking
at?

Rubinius, perhaps? Or make your own with Treetop?

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 30 November 2009 18:17

Brian Candler wrote:

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

* Anything else?

redparse

-r

···

--
Posted via http://www.ruby-forum.com/\.

Mason_Kelsey · 30 November 2009 23:38

Since I am taking some graduate level course in artificial intelligence at
UCF, under Dr. Fernando Gomez, this is also a topic of interest to me. The
only languages I've seen that are currently being used are Lisp, Java, and
Python. If you Google Natural Language Tool Kit, NLTK, you will find the
Brill tagger and parsers written in Python. Fortunately, if you know Ruby,
Python is just a step away, not a big transition. No sense in reinventing
the wheel again.

If you go with Python, I recommend the book *"Natural Language Processing
with Python"* by Bird, Klein, and Loper. The book tells you how to use the
Natural Language Tool Kit you download from http://www.nltk.org/download and
http://www.nltk.org/getting-started tells you how to get started and guides
for the code is found at
http://nltk.googlecode.com/svn/trunk/doc/howto/index.html and
http://nltk.googlecode.com/svn/trunk/doc/howto/tag.html shows you how to use
the taggers. You will need to down load numpy.py and import it also,
although it is only suggested in the descriptions for the tagger.

And as a double plus you can read the twitters from Dr. Hugo Lui. @dochugo

No Sam

···

On Sun, Nov 29, 2009 at 6:23 AM, Brian Candler <b.candler@pobox.com> wrote:

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

I've found RubyParser. Are there any other options I should be looking
at?

* I see ruby 1.9 has ripper, but it's written as a C extension. I want
something in pure ruby.

* ParseTree is also in C I believe.

* Possibly could look at the internals of rubinius?

* Anything else?

Thanks,

Brian.
--
Posted via http://www.ruby-forum.com/\.

daz · 1 December 2009 12:40

Brian Candler wrote:
> I'm looking for a ruby language parser written in ruby, that I can hack
> to play about with generating other ruby-like languages.
>
> I've found RubyParser. Are there any other options I should be looking
> at?
>
> * I see ruby 1.9 has ripper, but it's written as a C extension. I want
> something in pure ruby.
>
> * ParseTree is also in C I believe.
>
> * Possibly could look at the internals of rubinius?
>
> * Anything else?
>
> Thanks,
>
> Brian.

irb does quite a good job.

daz

Brian_Candler · 29 November 2009 12:55

Ryan Davis wrote:

well to be fair. ripper is yacc + C. ruby_parser is racc + ruby. Until
someone writes a recursive descent parser, you'll never have PURE ruby

That's fine; if it's using a standard parser generator I don't need to
hack that, just the language itself.

I think your best bet at this time is ruby_parser.

Sounds good to me, thank you.

···

--
Posted via http://www.ruby-forum.com/\.

Rick_DeNatale1 · 29 November 2009 15:46

That would be cool if it's possible. Old Smaltalkers love to see
recursive descent parsers written in the target language! <G>

···

On Sun, Nov 29, 2009 at 7:17 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

On Nov 29, 2009, at 03:23 , Brian Candler wrote:

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

I've found RubyParser. Are there any other options I should be looking
at?

* I see ruby 1.9 has ripper, but it's written as a C extension. I want
something in pure ruby.

well to be fair. ripper is yacc + C. ruby_parser is racc + ruby. Until someone writes a recursive descent parser, you'll never have PURE ruby (and it wouldn't be quite as maintainable). I'm actually working on that, but I don't yet know if I'll succeed.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Brian_Candler · 29 November 2009 18:26

Marnen Laibow-Koser wrote:

Rubinius, perhaps? Or make your own with Treetop?

I'm pretty familiar with CFGs and tools like yacc, or at least, I was
some years ago. What bothers me about ruby syntax is that there are a
number of aspects that I'm not sure how to map to a CFG, for example

puts (a).abs

being different from

puts(a).abs

... or,

a = b
+ c

being treated differently to

a = b +
c

So basically I was looking to steal ideas how to deal with these sorts
of cases.

Indeed, an existing grammar for ruby would be a good starting point, and
goggling for this I see there's a project "rubygrammar", with Charles
Nutter as one of the admins. Anyone know what state this is in? I see 61
commits in the repo, nothing in the last three years, and the grammar
looks too simple to be true

···

--
Posted via http://www.ruby-forum.com/\.

Ryan_Davis1 · 30 November 2009 21:35

Last time I looked at redparse I couldn't even get the tests to run. Only through lots of hacking did I get it executing and that was just to see how fast it was. Caleb found a _horrible_ ruby package that both redparse and ruby_parser choke on (think: it'll finish right around the time of heat death of the universe... and will probably contribute greatly to it). Otherwise, it didn't seem to be faster or better in any area (esp given how much work it was to get it to work at all).

Things may have changed since then.

···

On Nov 30, 2009, at 10:17 , Roger Pack wrote:

Brian Candler wrote:

I'm looking for a ruby language parser written in ruby, that I can hack
to play about with generating other ruby-like languages.

* Anything else?

redparse

Brian_Candler · 1 December 2009 08:06

Mason Kelsey wrote:

Since I am taking some graduate level course in artificial intelligence
at
UCF, under Dr. Fernando Gomez, this is also a topic of interest to me.

Although many people find programming in Ruby more natural than
programming in other languages, I've never heard it lumped in with
'natural languages' before...

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 1 December 2009 13:31

daz wrote:

Brian Candler wrote:
> I'm looking for a ruby language parser written in ruby

irb does quite a good job.

Interesting. I'd forgotten that irb would have to parse multi-line input
before handing it off to eval. Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 4 December 2009 21:45

Ryan Davis wrote:

I think your best bet at this time is ruby_parser.

Question for you: as far as I can make out, the handling of embeeded
linebreaks such as

a = b +
c

is handled via state kept in the lexer. Can I ask why you did it this
way? I am thinking it ought to be possible to do this in the grammar,
e.g.

expr: expr '+' opt_nl expr
> expr '-' opt_nl expr

opt_nl:
> nl

but if you've found out the hard way that it isn't, it could save me
following a dead end.

Also: can you summarise how you're handling expressions nested within
string literals, e.g. "abc #{foo} def" ?

Thanks,

Brian.

···

--
Posted via http://www.ruby-forum.com/\.

Marc_Heiler · 29 November 2009 18:29

Was using ruby_parser, but now is also C... you know... ruby in ruby.

Awww...

I still thought rubinius was going with the pure ruby approach.

···

--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 30 November 2009 21:40

Last time I looked at redparse I couldn't even get the tests to run.
Only through lots of hacking did I get it executing and that was just to
see how fast it was. Caleb found a _horrible_ ruby package that both
redparse and ruby_parser choke on (think: it'll finish right around the
time of heat death of the universe... and will probably contribute
greatly to it). Otherwise, it didn't seem to be faster or better in any
area (esp given how much work it was to get it to work at all).

Things may have changed since then.

LOL.

I know he's working on making it 1.9 compatible, but I think it's stable
for 1.8. It's niche is that it's written in "human generated" ruby.
-r

···

--
Posted via http://www.ruby-forum.com/\.

Caleb_Clausen1 · 4 December 2009 13:54

redparse

Last time I looked at redparse I couldn't even get the tests to run. Only

I wish you had/would let me know about whatever problem(s) it was that
you had, so I could/can fix it. I've definitely fixed a great many
problems since the first release, but I can't fix bugs that I don't
know about.

through lots of hacking did I get it executing and that was just to see how
fast it was.

I'll be the last to claim that redparse is (right now) a fast parser.
It's probably (still) the slowest ruby parser in the universe, tho it
should be somewhat faster than it was originally.

Caleb found a _horrible_ ruby package that both redparse and
ruby_parser choke on (think: it'll finish right around the time of heat
death of the universe... and will probably contribute greatly to it).

Ah, yes, japanese/zipcodes.rb. A 65MB ruby file.... it's an
interesting test case, since any reasonable parser ought to be able to
parse it all before the cows come home. It wasn't until I talked to
you about it that I realized that my parser (and yours too? still?)
slows down as N**2 (or is it e**N), where N is the number of tokens in
input. Fundamentally, both parsing algorithms are O(K*N) (for a fairly
large K), but since both build up a parse tree as they go, and
sometimes the garbage collector has to run in the background, which
has to visit everything in that partial parse tree as it goes, the
time for garbage collection runs in the background increases as the
length of the input. Overall this makes for unreasonably super-linear
slowdowns of the algorithm. I had never thought that just adding
garbage collection to an algorithm could take it from O(N) to O(N**2).

Usually, this isn't an issue since the parser finishes for any
non-ridiculously sized input (eg <1MB) before the garbage collector
needs to run. I suppose that using a generational garbage collector
would return things to O(N) land. Reducing the garbage production rate
of the parser might help a lot too.

On the other hand, why does this particular data set need to be
expressed as executable ruby code, when something like CSV seems so
much more reasonable....

Otherwise, it didn't seem to be faster or better in any area (esp given how
much work it was to get it to work at all).

I will readily admit that your parser has (for now) the advantage in
terms of speed, ability to run in MRI 1.9, and ability to parse 1.9
expressions. However, it also presents a fairly ugly interface to the
user. parse_tree/ruby_parser trees are these yucky lisp-like things
which behave in various unexpected ways. Whereas redparse trees are a
great deal nicer, IMNSHO. I specifically designed redparse to have
pleasant parsetrees as its output; as I see it, redparse's tree format
has a number of advantages over the lisp-like output
parse_tree/ruby_parser or the smalltalk-like output of RubyNode:

1) it's object oriented
2) it closely mirrors structure of original source code
3) it has better positioning information (line #/byte offset)
4) it behaves in predicable, expectable ways in most cases
5) things like begin..end and def..end have action unified in one
node, instead of spread out over multiple nodes nested in strange ways
6) operators are actually a separate type of node, instead of being a
weirdly named method call

The first of those points is really the key one. Having an
object-oriented interface makes the parse trees much nicer to deal
with programatically than anything else. However, the user need not be
limited to that; he can use redparse's parse trees in list-oriented or
hash-oriented if for some reason those prove to be better. So really,
redparse presents the best of all 3 worlds; its interface is
object-oriented, list-oriented, or hash-oriented depending on what
best suits the user.

Now, all of these things have been true about redparse basically from
the beginning, altho the interface has changed slightly (I think for
the better) over time. Perhaps you prefer the parse_tree style
interface; in fact I'm certain that you do. But when I'm doing the
kind of deep metaprogramming which requires access to a parse tree, I
want a tree format that's going to be as simple, regular, and
predictable as possible. Ruby's syntax is fairly complex, and any type
of parse tree representing that syntax is going to inevitably reflect
a fair amount of that complexity as a result. But there is a virtue to
be had in making the resulting trees as simple as it's possible for
them to be, rather than squirrelly and convoluted in unnecessary extra
ways.

···

On 11/30/09, Ryan Davis <ryand-ruby@zenspider.com> wrote:

On Nov 30, 2009, at 10:17 , Roger Pack wrote:

Things may have changed since then.

Ryan_Davis1 · 4 December 2009 22:52

Ryan Davis wrote:

I think your best bet at this time is ruby_parser.

Question for you: as far as I can make out, the handling of embeeded
linebreaks such as

a = b +
c

is handled via state kept in the lexer. Can I ask why you did it this
way?

My default answer to nearly any of these types of questions is going to be "because that's how MRI does it".

In this case, yes, the lexer and parser are keeping a shared variable called lex_state that knows whether it is in the middle of an expression (and many other states) so here the trailing '+' keeps the expression open.

lex_state needs to die. It basically only exists because the language is a tangled mess and it was designed with LR parsing in mind (AFAICT, lex_state is a symptom of not really knowing where you are contextually, because you're parsing bottom up). I'd like to not have lex_state and many of the complications that come with it. I'm not sure if it'll still be ruby at that point, but I'm giving it a go to see.

I am thinking it ought to be possible to do this in the grammar,
e.g.

  expr: expr '+' opt_nl expr
      > expr '-' opt_nl expr

  opt_nl:
      > nl

you're going to have those EVERYWHERE... but yeah, it should be possible. Write good tests from the beginning.

Also: can you summarise how you're handling expressions nested within
string literals, e.g. "abc #{foo} def" ?

There is no way to summarize that. It is horrible and I hate it, but I'm not in a position to make it work better with the current architecture.

···

On Dec 4, 2009, at 13:45 , Brian Candler wrote:

Jason_R · 29 November 2009 18:47

Maglev's got one:

Jason

Ryan_Davis1 · 29 November 2009 21:25

yeah. well. They left that idea FAR behind a long time ago. Sad really, and I don't think there are any plans to push towards a more pure ruby approach.

···

On Nov 29, 2009, at 10:29 , Marc Heiler wrote:

Was using ruby_parser, but now is also C... you know... ruby in ruby.

Awww...

I still thought rubinius was going with the pure ruby approach.

Brian_Candler · 4 December 2009 15:07

A minor problem with the redparse gem is that it gives most of the files
root-only permissions. I just did 'sudo gem install redparse'

$ pwd
/usr/lib/ruby/gems/1.8/gems/redparse-0.8.3
$ ls -l lib
total 96
drwxr-xr-x 2 root root 4096 2009-12-04 14:07 redparse
-rwx------ 1 root root 89730 2009-12-04 14:07 redparse.rb
$ ls -l lib/redparse
total 188
-rwx------ 1 root root 2542 2009-12-04 14:07 babynodes.rb
-rwx------ 1 root root 6348 2009-12-04 14:07 babyparser.rb
-rwx------ 1 root root 10545 2009-12-04 14:07 decisiontree.rb
-rw-r--r-- 1 root root 13385 2009-12-04 14:07 generate.rb
-rwxr-xr-x 1 root root 134917 2009-12-04 14:07 node.rb
-rwxr--r-- 1 root root 2041 2009-12-04 14:07 problemfiles.rb
-rwx------ 1 root root 2664 2009-12-04 14:07 reg_more_sugar.rb
-rwxr--r-- 1 root root 40 2009-12-04 14:07 version.rb

Easily fixable of course:

sudo bash
find . -type f | xargs chmod +r
find . -type d | xargs chmod +rx

Most of those .rb files don't need +x either, since they don't have a
shebang line.

Looking at this code - I don't think I would dare hack it. I think
ruby_parser is more what I was looking for.

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
Alternative Ruby grammar ruby-talk	22	261	18 November 2007
Parser generator ruby-talk	12	94	31 July 2003
Why are parser tools rarely used in ruby? ruby-talk	24	231	19 September 2002
bRuby? ruby-talk	15	118	8 November 2002
[ANN] ruby_parser 2.0.0 Released ruby-talk	8	115	29 October 2008

Ruby language parser in ruby

Related topics