Ruby language parser in ruby

Ryan_Davis1 · 29 November 2009 21:24

also ruby_parser... WAY WAY WAY hacked up, but ruby_parser.

I'd like to incorporate some of their ideas back into ruby_parser, but it is sooo different, that merging is nearly impossible.

···

On Nov 29, 2009, at 10:47 , Jason Roelofs wrote:

Maglev's got one:

maglev/src/kernel/parser at master · MagLev/maglev · GitHub

David_Masover · 30 November 2009 09:17

I'm not sure they wanted to do that for everything, from the parsing to the
VM, though that would've been cool.

Probably the thing I love most about Rubinius, as it was when I last looked,
is that it fulfills some of the original promise of Ruby. When I first started
looking at Ruby, people said "Most of Ruby is written in Ruby." I took this to
mean a very lightweight language with a huge standard library. Well, much of
the standard library, and most of the core libraries, are written in C, for
performance reasons, significantly limiting what I can do in Ruby.

For example, back when I was writing autoloader, I can't remember why, but I
really wanted Kernel#autoload to actually use Kernel#require when the module
is accessed. Ideally, I wanted something like this:

autoload :FooBar do |name|
require 'foo_bar'
end

Unfortunately, Kernel#autoload doesn't do that. Last I checked, it directly
calls the C code that Kernel#require maps onto -- at the time, this
effectively bypassed Rubygems as a whole.

It was to the point where to create an alternate way to autoload stuff, I was
going to have to hack the Ruby source, in C.

...really?

Contrast this to Rubinius. The source was clean and readable, and I was easily
able to figure out how to accomplish what I wanted, in pure Ruby, without
modifying any of the source.

By the way: It doesn't have to be slower. Remember Google's Javascript engine,
v8? It implements the Javascript standard library in Javascript, yet v8
_still_ wins in the benchmarks against other, more conservative Javascript
implementations.

···

On Sunday 29 November 2009 03:25:17 pm Ryan Davis wrote:

On Nov 29, 2009, at 10:29 , Marc Heiler wrote:
>> Was using ruby_parser, but now is also C... you know... ruby in ruby.
>
> Awww...
>
> I still thought rubinius was going with the pure ruby approach.

yeah. well. They left that idea FAR behind a long time ago.

Caleb_Clausen1 · 4 December 2009 16:31

A minor problem with the redparse gem is that it gives most of the files
root-only permissions. I just did 'sudo gem install redparse'

Yes, that is indeed most unfortunate and due to be fixed in the next
release, which is coming Real Soon Now. I'm not sure how those
permissions got all weird (again); it's not something I usually give a
lot of thought or attention to, I'm afraid. (Frankly, I wish rubygems
would have warned me when I created the gem that some of the files
were not world-readable.)

Looking at this code - I don't think I would dare hack it. I think
ruby_parser is more what I was looking for.

Would you care to elaborate on that? What didn't you like and/or find
hard to understand? What could I do better? Actually, redparse is a
fairly normal LALR-based parser, the code divides internally into
definitions, rules, and actions. I did invent my own LR language, tho,
being as I'm so unhappy with yacc and friends.

I'm interested to hear, to tell the truth, what sorts of things you
want to change in the existing ruby grammar; I'm very much like
playing around with extending the language in various ways. I have a
variety of ideas I want to pursue myself, but I'm always interested to
hear what new features other people might want.

···

On 12/4/09, Brian Candler <b.candler@pobox.com> wrote:

Ryan_Davis1 · 30 November 2009 21:33

Was using ruby_parser, but now is also C... you know... ruby in ruby.

Awww...

I still thought rubinius was going with the pure ruby approach.

yeah. well. They left that idea FAR behind a long time ago.

I'm not sure they wanted to do that for everything, from the parsing to the
VM, though that would've been cool.

That was my understanding when I started working professionally on it.

Probably the thing I love most about Rubinius, as it was when I last looked,
is that it fulfills some of the original promise of Ruby. When I first started looking at Ruby, people said "Most of Ruby is written in Ruby."

I've never ever heard such a thing uttered about ruby. It is beyond obvious when you look at any version of the tarball.

···

On Nov 30, 2009, at 01:17 , David Masover wrote:

On Sunday 29 November 2009 03:25:17 pm Ryan Davis wrote:

On Nov 29, 2009, at 10:29 , Marc Heiler wrote:

Brian_Candler · 4 December 2009 16:42

Caleb Clausen wrote:

A minor problem with the redparse gem is that it gives most of the files
root-only permissions. I just did 'sudo gem install redparse'

Yes, that is indeed most unfortunate and due to be fixed in the next
release, which is coming Real Soon Now. I'm not sure how those
permissions got all weird (again); it's not something I usually give a
lot of thought or attention to, I'm afraid. (Frankly, I wish rubygems
would have warned me when I created the gem that some of the files
were not world-readable.)

Looking at this code - I don't think I would dare hack it. I think
ruby_parser is more what I was looking for.

Would you care to elaborate on that? What didn't you like and/or find
hard to understand?

Well, all of it really.

I presume that lib/redparse.rb is the "real" parser (I also found
lib/redparse/babyparser.rb and babynodes.rb)

It's a monolithic file, and it was hard even to see where the grammar
began. I believe it's here:

    [
    -[UNOP, Expr, lower_op]>>UnOpNode,
    -[DEFOP, ParenedNode]>>UnOpNode,
    -[Op(/^(?:unary|lhs|rhs)\*$/), ValueNode, lower_op]>>UnaryStarNode,
    ... etc

and an example of a larger rule is like this:

   -[NumberToken&-{:negative=>true}, Op('**').la]>>
      stack_monkey("fix_neg_exp",2,Op("-@",true)){|stack|
        #neg_op.unary=true
        num=stack[-2]
        op=OperatorToken.new("-@",num.offset)
# op.startline=num.startline
        stack[-2,0]=op
        num.ident.sub!(/\A-/,'')
        num.offset+=1
      },

I have no idea how to (a) understand, or (b) modify that. I can see
there is quite a lot of Ruby operator abuse going on, but without
defined semantics.

At least with racc, it's extremely well documented, in the sense that I
have a printout of the yacc manual to refer to.

So no doubt RedParse is a fine ruby parser, and generates a fine object
tree as its output. But it's not so good for me as a starting point for
building languages which inherit some of ruby flavour, but are
significantly different.

···

On 12/4/09, Brian Candler <b.candler@pobox.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Caleb_Clausen1 · 4 December 2009 20:27

Caleb Clausen wrote:

Would you care to elaborate on that? What didn't you like and/or find
hard to understand?

Well, all of it really.

I presume that lib/redparse.rb is the "real" parser (I also found
lib/redparse/babyparser.rb and babynodes.rb)

Yes, that's right. babyparser.rb is an example of a minimal (3 rule)
parser using this system, and can make a good place to start trying to
understand the full-blown parser.

It's a monolithic file, and it was hard even to see where the grammar
began. I believe it's here:

Unfortunately, there's a lot of experimental stuff which shouldn't be
in there, related to my attempt to write a proper parser compiler.

    [
    -[UNOP, Expr, lower_op]>>UnOpNode,
    -[DEFOP, ParenedNode]>>UnOpNode,
    -[Op(/^(?:unary|lhs|rhs)\*$/), ValueNode, lower_op]>>UnaryStarNode,
    ... etc

These are indeed the rules. The general form is of a rule is
-[ patterns to search for on the top of parse stack ] >>
NodeTypeTheyGetReplacedWith,

These 3 rules deal respectively with (most) unary operators, the
defined? operator, and unary star operators. (What's found to the
right of the >> is generally a pretty good clue as to what a
particular rule is doing. (But not always in the case of stack
monkeys...)) UNOP, DEFOP, Expr, lower_op and the like are defined
above in the definitions section.

Altho the action to take on finding a pattern on the parse stack is
not always to reduce the matched portion of the stack into a Node. For
instance, here:

and an example of a larger rule is like this:

   -[NumberToken&-{:negative=>true}, Op('**').la]>>
      stack_monkey("fix_neg_exp",2,Op("-@",true)){|stack|
        #neg_op.unary=true
        num=stack[-2]
        op=OperatorToken.new("-@",num.offset)
# op.startline=num.startline
        stack[-2,0]=op
        num.ident.sub!(/\A-/,'')
        num.offset+=1
      },

That's not really what I'd call a larger rule, merely (alas) a longer
one... This is an example of one of many relatively unimportant rules
with which the parser must unfortunately be littered. This particular
example fixes up the precedence of expressions like -2**10. '-2' is
normally lexed as one single numeric token, as is normal in most
languages. In this one special case, however, the -@ must actually be
made lower precedence than **. The implementation of this fixup can't
be neatly shoehorned into a Node constructor, however, so special
imperative code (a 'stack monkey') had to be written to fiddle with
the parse stack directly.

I have no idea how to (a) understand, or (b) modify that. I can see
there is quite a lot of Ruby operator abuse going on, but without
defined semantics.

I would call that a 'DSL'. Most of the special operators and other
unusual syntax are defined in my pattern matching language, Reg, which
is a different project. Reg is moderately well documented, but that's
in a whole other directory.

At least with racc, it's extremely well documented, in the sense that I
have a printout of the yacc manual to refer to.

I just couldn't ever get yacc to do what I wanted it to do,
personally. Lots of other people have had more luck....

I would like some day (if I ever have time) to split out the parser
construction tool aspects or redparse from the actual ruby parser
itself, and package and document the parser compiler/interpreter
better. For now, altho I have made an effort to make the interface to
RedParse fairly clear and well described, the internals I simply
didn't even try to explain....

So no doubt RedParse is a fine ruby parser, and generates a fine object
tree as its output. But it's not so good for me as a starting point for
building languages which inherit some of ruby flavour, but are
significantly different.

I can explain more IF you're interested, but it does seem like you
know where you want to go right now.

I'd still like to hear more about this language(s) you're trying to
make, if you want to tell.

···

On 12/4/09, Brian Candler <b.candler@pobox.com> wrote:

Brian_Candler · 4 December 2009 21:41

Caleb Clausen wrote:

This particular
example fixes up the precedence of expressions like -2**10. '-2' is
normally lexed as one single numeric token, as is normal in most
languages. In this one special case, however, the -@ must actually be
made lower precedence than **

Ugh. No doubt that sort of thing will bite me.

I'd still like to hear more about this language(s) you're trying to
make, if you want to tell.

I'm going to try to implement a 'ruby flavoured erlang', a front-end
which spits out regular erlang which is compiled in the normal way, to
see what such a language might look like.

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Parsing Ruby ruby-talk	3	79	19 August 2004
Does any Ruby parser exist? ruby-talk	4	121	20 May 2005
[ANN] ruby_parser 1.0.0 Released ruby-talk	5	112	25 July 2008
[ANN] RedParse 0.8.0 released ruby-talk	0	129	23 October 2008
[ANN] redparse 0.8.1 Released ruby-talk	0	136	1 May 2009

Ruby language parser in ruby

Related topics