…or better yet LL(1)? Can a recursive descent parser work with all of
Ruby’s grammar?
Last time I checked, “The Parrot Guys”(tm were attempting to parse
Ruby with Parse::RecDescent… Can it possibly work? Doesn’t Ruby need
the full power of yacc’s LALR(1)?
At Thu, 26 Sep 2002 21:21:56 +0900, Mauricio =?unknown-8bit?Q?Fern=E1ndez?= wrote:
Last time I checked, “The Parrot Guys”(tm were attempting to parse
Ruby with Parse::RecDescent… Can it possibly work? Doesn’t Ruby need
the full power of yacc’s LALR(1)?
Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.
But right now Ruby’s main implementation uses bison, ie LALR(1). I’m not
sure if you can actually parse things outside LALR(1) (w/ dirty hacks I
suppose) while staying with yacc, but Ruby cannot indeed be too far from
LALR(1). But I do have my doubts about LL(k)…
Last time I checked, “The Parrot Guys”(tm were attempting to parse
Ruby with Parse::RecDescent… Can it possibly work? Doesn’t Ruby need
the full power of yacc’s LALR(1)?
Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.
Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.
Is this due to the use of yield and blocks? Or do you think
that there are other parts that won’t be accessible to LL(1)?
Why should yield and blocks be any harder to parse?
I was rather thinking of
STMT := …
> STMT if EXPR
> STMT while EXPR
> STMT unless EXPR
> STMT until EXPR
I’m not sure whether it can be left-factored or not… but I feel it
cannot. Gotta think about it a little more.
···
On Fri, Sep 27, 2002 at 10:09:49PM +0900, Mark Probert wrote:
“Mauricio Fernández” batsman.geo@yahoo.com wrote in message
news:20020927180455.GB535@kodos…
But right now Ruby’s main implementation uses bison, ie LALR(1). I’m not
sure if you can actually parse things outside LALR(1) (w/ dirty hacks I
suppose) while staying with yacc, but Ruby cannot indeed be too far from
LALR(1). But I do have my doubts about LL(k)…
There is a difference between grammar and language.
The same language can be defined by multiple grammars and many LALR(1)
grammars can be rewritten into LL(1) grammars. Most importantly recursion
must change from right to left and common prefixes must be collected into a
common rule. Likewise, many LL(k) grammars can be rewritten to LL(1) by
adding more rules.
LL grammars can be easier to hack with various conditions that must be true
to make a match. So in praxis this shouldn’t be a major worry unless you
have a LL(1) tool that doesn’t provide the option for tweaking. (ANTLR has
plenty of possibilities to add conditions and is LL(something)).
Most languages need some sort of tweaking in any case - the dangling else is
a popular example.
Since it is tedious to rewrite grammars to conform to dumb tools, we prefer
stronger tools. RockIt has, AFAIK a fairly strong parallel parsing algorithm
that make it possible to handle many conflicts that LALR(1) wouldn’t -
useful for natural language parsing. Such powerful parsing features may come
at a time and space performance cost and it does pay off to make the grammar
parser friendly.
I agree with that. LL(1) has more of a issue, I think,
with not knowing the context of a statement. There are
quite a few ruby-isms that are context dependant.
For example:
while gets
next if /^#/ # Skip comments
parseLine unless /^$/ # Don’t parse empty lines end
end
may be tricky in LL(1). I have to think about that…
-mark.
···
At 09:26 AM 9/28/2002 +0900, you wrote:
“Yukihiro Matsumoto” matz@ruby-lang.org wrote in message
news:1033171762.477103.17040.nullmailer@picachu.netlab.jp…
“Mauricio Fernández” batsman.geo@yahoo.com wrote in message
news:20020927180455.GB535@kodos…
But right now Ruby’s main implementation uses bison, ie LALR(1). I’m not
sure if you can actually parse things outside LALR(1) (w/ dirty hacks I
suppose) while staying with yacc, but Ruby cannot indeed be too far from
LALR(1). But I do have my doubts about LL(k)…
There is a difference between grammar and language.
The same language can be defined by multiple grammars and many LALR(1)
grammars can be rewritten into LL(1) grammars. Most importantly recursion
must change from right to left and common prefixes must be collected into a
common rule. Likewise, many LL(k) grammars can be rewritten to LL(1) by
adding more rules.
So I understand we can possibly create a LL(2) [according to matz]
grammar to parse the Ruby language (considered as the set of all the
possible valid program source codes), but the semantics of the direct
interpretation of the resulting parse-tree wouldn’t match the “real” ones.
I do however have some suspicions about statement modifiers (if,
unless…) rendering Ruby unparseable w/ LL(k). I feel the corresponding
productions cannot be left-factored.
LL grammars can be easier to hack with various conditions that must be true
to make a match. So in praxis this shouldn’t be a major worry unless you
have a LL(1) tool that doesn’t provide the option for tweaking. (ANTLR has
plenty of possibilities to add conditions and is LL(something)).
Most languages need some sort of tweaking in any case - the dangling else is
a popular example.
Since it is tedious to rewrite grammars to conform to dumb tools, we prefer
stronger tools. RockIt has, AFAIK a fairly strong parallel parsing algorithm
that make it possible to handle many conflicts that LALR(1) wouldn’t -
useful for natural language parsing. Such powerful parsing features may come
at a time and space performance cost and it does pay off to make the grammar
parser friendly.
Thanks for the explanation. I was curious about Parrot’s languages/ruby
recursive descent parser; now I can see what’s going on.
···
On Sat, Sep 28, 2002 at 08:26:15AM +0900, MikkelFJ wrote:
parseLine unless /^$/ # Don't parse empty lines end
This is a real problem in Ruby parsing I guess:
You can easily handle many of Rubys linebreak rules if you include linebreak
as a token, but if you do that, you will have to deal with them a lot of
places where you would rather ignore them as whitespace. I think this can be
handled by having the parser cooperating with the lexer. Obviously it is
doable since Ruby parsers exist.
It is true that Ruby parsers exist, the question was more related to LL(1)
parsers for Ruby. In this case, the challenge isn’t the linebreak (I just
yanked that code from the Pickaxe book), rather it is the if and unless
conditions.
As Mauricio points out in another post, these statements can exist in
a variety of forms:
if <cond> then
<statements>
end
if <cond>
<statements>
end
<statement> if <cond>
with or without multiple elsif and else clauses. The optional then
is going to cause some LL(1) problems because I don’t know that 1 token
lookahead is going to be enough. The same is true of the unless, while
and until loops, I think.
It is going to be fun to try, though
-mark.
···
At 06:29 AM 9/29/2002 +0900, MikkelFJ wrote:
parseLine unless /^$/ # Don't parse empty lines end
This is a real problem in Ruby parsing I guess:
You can easily handle many of Rubys linebreak rules if you include linebreak
as a token, but if you do that, you will have to deal with them a lot of
places where you would rather ignore them as whitespace. I think this can be
handled by having the parser cooperating with the lexer. Obviously it is
doable since Ruby parsers exist.
It is true that Ruby parsers exist, the question was more related to LL(1)
parsers for Ruby. In this case, the challenge isn’t the linebreak (I just
yanked that code from the Pickaxe book), rather it is the if and unless
conditions.
It has very much to do with linebreaks:
if <cond> then
<statements>
end
if <cond>
<statements>
end
The above are the same form - you can have the rule
::= (then [linebreak] | linebreak)
::= if end
<statement> if <cond>
This variant of “if” can only happen after an unterminated statement. So
therefore we define two statement types: terminated and unterminated
statements.
::= (linebreak | semicolon
EOF)
::= if
The above isn’t exact becuase you also need to include statements terminated
by end in .
So by having linebreak as a token, it should be possible to deal with.
The problem is that when you have the linebreak, it becomes annoying in many
places such as
::= plus [linebreak]
But then I guess linebreak is so important in Ruby, that you generally need
to be aware of it. You can’t just ignore it becuase:
::= [linebreak] plus [linebreak]
is not possible.
…
It is going to be fun to try, though
So altogether it should be possible - but perhaps the above is what you
meant by fun
Of course I haven’t covered everything, and I don’t know the Ruby syntax
well enough. But I don’t see the examples as being particularly impossible
to parse in LL(1).