Is Ruby's grammar LL(k)?

…or better yet LL(1)? Can a recursive descent parser work with all of
Ruby’s grammar?

Last time I checked, “The Parrot Guys”(tm :slight_smile: were attempting to parse
Ruby with Parse::RecDescent… Can it possibly work? Doesn’t Ruby need
the full power of yacc’s LALR(1)?

···


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Yes I have a Machintosh, please don’t scream at me.
– Larry Blumette on linux-kernel

Hi,

···

At Thu, 26 Sep 2002 21:21:56 +0900, Mauricio =?unknown-8bit?Q?Fern=E1ndez?= wrote:

Last time I checked, “The Parrot Guys”(tm :slight_smile: were attempting to parse
Ruby with Parse::RecDescent… Can it possibly work? Doesn’t Ruby need
the full power of yacc’s LALR(1)?

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.


Nobu Nakada

Hi, Nobu.

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.

Is this due to the use of yield and blocks? Or do you think
that there are other parts that won’t be accessible to LL(1)?

-mark.

But right now Ruby’s main implementation uses bison, ie LALR(1). I’m not
sure if you can actually parse things outside LALR(1) (w/ dirty hacks I
suppose) while staying with yacc, but Ruby cannot indeed be too far from
LALR(1). But I do have my doubts about LL(k)…

Guess I’ll have to check parse.y again.

···

On Fri, Sep 27, 2002 at 07:31:33PM +0900, nobu.nokada@softhome.net wrote:

Hi,

At Thu, 26 Sep 2002 21:21:56 +0900, > Mauricio =?unknown-8bit?Q?Fern=E1ndez?= wrote:

Last time I checked, “The Parrot Guys”(tm :slight_smile: were attempting to parse
Ruby with Parse::RecDescent… Can it possibly work? Doesn’t Ruby need
the full power of yacc’s LALR(1)?

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Yes I have a Machintosh, please don’t scream at me.
– Larry Blumette on linux-kernel

Hi,

···

In message “Re: Is Ruby’s grammar LL(k)?” on 02/09/27, nobu.nokada@softhome.net nobu.nokada@softhome.net writes:

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.

I think the current implementation requires two look ahead for the
pattern like “…”. So it should be LALR(2), and possibly LL(2).

						matz.

Hi, Nobu.

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.

Is this due to the use of yield and blocks? Or do you think
that there are other parts that won’t be accessible to LL(1)?

Why should yield and blocks be any harder to parse?
I was rather thinking of
STMT := …
> STMT if EXPR
> STMT while EXPR
> STMT unless EXPR
> STMT until EXPR

I’m not sure whether it can be left-factored or not… but I feel it
cannot. Gotta think about it a little more.

···

On Fri, Sep 27, 2002 at 10:09:49PM +0900, Mark Probert wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Linux is addictive, I’m hooked!
– MaDsen Wikholm’s .sig

“Mauricio Fernández” batsman.geo@yahoo.com wrote in message
news:20020927180455.GB535@kodos…

But right now Ruby’s main implementation uses bison, ie LALR(1). I’m not
sure if you can actually parse things outside LALR(1) (w/ dirty hacks I
suppose) while staying with yacc, but Ruby cannot indeed be too far from
LALR(1). But I do have my doubts about LL(k)…

There is a difference between grammar and language.

The same language can be defined by multiple grammars and many LALR(1)
grammars can be rewritten into LL(1) grammars. Most importantly recursion
must change from right to left and common prefixes must be collected into a
common rule. Likewise, many LL(k) grammars can be rewritten to LL(1) by
adding more rules.
LL grammars can be easier to hack with various conditions that must be true
to make a match. So in praxis this shouldn’t be a major worry unless you
have a LL(1) tool that doesn’t provide the option for tweaking. (ANTLR has
plenty of possibilities to add conditions and is LL(something)).

Most languages need some sort of tweaking in any case - the dangling else is
a popular example.

Since it is tedious to rewrite grammars to conform to dumb tools, we prefer
stronger tools. RockIt has, AFAIK a fairly strong parallel parsing algorithm
that make it possible to handle many conflicts that LALR(1) wouldn’t -
useful for natural language parsing. Such powerful parsing features may come
at a time and space performance cost and it does pay off to make the grammar
parser friendly.

Mikkel

“Yukihiro Matsumoto” matz@ruby-lang.org wrote in message
news:1033171762.477103.17040.nullmailer@picachu.netlab.jp…

Hi,

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.

I think the current implementation requires two look ahead for the
pattern like “…”. So it should be LALR(2), and possibly LL(2).

Can’t this be written with

value ::= token_integer | token_integer token_dotdot

This doesn’t require a 2 lookahead, but it does require that you inspect the
type of value subsequently.

Mikkel

···

In message “Re: Is Ruby’s grammar LL(k)?” > on 02/09/27, nobu.nokada@softhome.net nobu.nokada@softhome.net writes:

I agree with that. LL(1) has more of a issue, I think,
with not knowing the context of a statement. There are
quite a few ruby-isms that are context dependant.

For example:

while gets
next if /^#/ # Skip comments
parseLine unless /^$/ # Don’t parse empty lines end
end

may be tricky in LL(1). I have to think about that…

-mark.

···

At 09:26 AM 9/28/2002 +0900, you wrote:

“Yukihiro Matsumoto” matz@ruby-lang.org wrote in message
news:1033171762.477103.17040.nullmailer@picachu.netlab.jp…

Hi,

In message “Re: Is Ruby’s grammar LL(k)?” > > on 02/09/27, nobu.nokada@softhome.net nobu.nokada@softhome.net >writes:

Perhaps, it really wants LALR(k) or possibly LL(k), where k is
3 or more.

I think the current implementation requires two look ahead for the
pattern like “…”. So it should be LALR(2), and possibly LL(2).

Can’t this be written with

value ::= token_integer | token_integer token_dotdot

“Mauricio Fernández” batsman.geo@yahoo.com wrote in message
news:20020927180455.GB535@kodos…

But right now Ruby’s main implementation uses bison, ie LALR(1). I’m not
sure if you can actually parse things outside LALR(1) (w/ dirty hacks I
suppose) while staying with yacc, but Ruby cannot indeed be too far from
LALR(1). But I do have my doubts about LL(k)…

There is a difference between grammar and language.

The same language can be defined by multiple grammars and many LALR(1)
grammars can be rewritten into LL(1) grammars. Most importantly recursion
must change from right to left and common prefixes must be collected into a
common rule. Likewise, many LL(k) grammars can be rewritten to LL(1) by
adding more rules.

So I understand we can possibly create a LL(2) [according to matz]
grammar to parse the Ruby language (considered as the set of all the
possible valid program source codes), but the semantics of the direct
interpretation of the resulting parse-tree wouldn’t match the “real” ones.

I do however have some suspicions about statement modifiers (if,
unless…) rendering Ruby unparseable w/ LL(k). I feel the corresponding
productions cannot be left-factored.

LL grammars can be easier to hack with various conditions that must be true
to make a match. So in praxis this shouldn’t be a major worry unless you
have a LL(1) tool that doesn’t provide the option for tweaking. (ANTLR has
plenty of possibilities to add conditions and is LL(something)).

Most languages need some sort of tweaking in any case - the dangling else is
a popular example.

Since it is tedious to rewrite grammars to conform to dumb tools, we prefer
stronger tools. RockIt has, AFAIK a fairly strong parallel parsing algorithm
that make it possible to handle many conflicts that LALR(1) wouldn’t -
useful for natural language parsing. Such powerful parsing features may come
at a time and space performance cost and it does pay off to make the grammar
parser friendly.

Thanks for the explanation. I was curious about Parrot’s languages/ruby
recursive descent parser; now I can see what’s going on.

···

On Sat, Sep 28, 2002 at 08:26:15AM +0900, MikkelFJ wrote:


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Not only Guinness - Linux is good for you, too.
– Banzai on IRC

I am sorry, I know the code below is just an example. But will I be
correct if I write

while gets
    next if /^\s*#/             # Skip comments
    parseLine unless /^\s*$/    # Don't parse blank lines
end

? Or are there other cases, beside the =begin and =end blocks?

Regards,

Bill

···

===========================================================================
Mark Probert probertm@nortelnetworks.com wrote:

For example:

while gets
next if /^#/ # Skip comments
parseLine unless /^$/ # Don’t parse empty lines end
end

“Mark Probert” probertm@nortelnetworks.com wrote in message
news:5.1.0.14.2.20020928094656.00b0fe90@zcard04k.ca.nortel.com

     parseLine unless /^$/ # Don't parse empty lines end

This is a real problem in Ruby parsing I guess:

You can easily handle many of Rubys linebreak rules if you include linebreak
as a token, but if you do that, you will have to deal with them a lot of
places where you would rather ignore them as whitespace. I think this can be
handled by having the parser cooperating with the lexer. Obviously it is
doable since Ruby parsers exist.

Mikkel

Working on enhancing SRuby? :wink:

···

On Sun, Sep 29, 2002 at 12:07:53AM +0900, William Djaja Tjokroaminata wrote:

I am sorry, I know the code below is just an example. But will I be
correct if I write

while gets
    next if /^\s*#/             # Skip comments
    parseLine unless /^\s*$/    # Don't parse blank lines
end

? Or are there other cases, beside the =begin and =end blocks?

Regards,

Bill

Mark Probert probertm@nortelnetworks.com wrote:

For example:

while gets
next if /^#/ # Skip comments
parseLine unless /^$/ # Don’t parse empty lines end
end


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Whoa…I did a ‘zcat /vmlinuz > /dev/audio’ and I think I heard God…
– mikecd on #Linux

It is true that Ruby parsers exist, the question was more related to LL(1)
parsers for Ruby. In this case, the challenge isn’t the linebreak (I just
yanked that code from the Pickaxe book), rather it is the if and unless
conditions.

As Mauricio points out in another post, these statements can exist in
a variety of forms:

 if <cond> then
     <statements>
 end

 if <cond>
     <statements>
 end

 <statement> if <cond>

with or without multiple elsif and else clauses. The optional then
is going to cause some LL(1) problems because I don’t know that 1 token
lookahead is going to be enough. The same is true of the unless, while
and until loops, I think.

It is going to be fun to try, though :wink:

-mark.

···

At 06:29 AM 9/29/2002 +0900, MikkelFJ wrote:

     parseLine unless /^$/ # Don't parse empty lines end

This is a real problem in Ruby parsing I guess:

You can easily handle many of Rubys linebreak rules if you include linebreak
as a token, but if you do that, you will have to deal with them a lot of
places where you would rather ignore them as whitespace. I think this can be
handled by having the parser cooperating with the lexer. Obviously it is
doable since Ruby parsers exist.

No, no, that is what I already have in SRuby :slight_smile:

Cheers,

Bill

···

==========================================================================
Mauricio Fern?ndez batsman.geo@yahoo.com wrote:

Working on enhancing SRuby? :wink:

“Mark Probert” probertm@nortelnetworks.com wrote in message
news:5.1.0.14.2.20020930095528.00b0ffc8@zcard04k.ca.nortel.com

It is true that Ruby parsers exist, the question was more related to LL(1)
parsers for Ruby. In this case, the challenge isn’t the linebreak (I just
yanked that code from the Pickaxe book), rather it is the if and unless
conditions.

It has very much to do with linebreaks:

 if <cond> then
     <statements>
 end

 if <cond>
     <statements>
 end

The above are the same form - you can have the rule
::= (then [linebreak] | linebreak)
::= if end

 <statement> if <cond>

This variant of “if” can only happen after an unterminated statement. So
therefore we define two statement types: terminated and unterminated
statements.

::= (linebreak | semicolon

EOF)
::= if

The above isn’t exact becuase you also need to include statements terminated
by end in .

So by having linebreak as a token, it should be possible to deal with.
The problem is that when you have the linebreak, it becomes annoying in many
places such as

::= plus [linebreak]

But then I guess linebreak is so important in Ruby, that you generally need
to be aware of it. You can’t just ignore it becuase:

::= [linebreak] plus [linebreak]

is not possible.

It is going to be fun to try, though :wink:

So altogether it should be possible - but perhaps the above is what you
meant by fun :slight_smile:
Of course I haven’t covered everything, and I don’t know the Ruby syntax
well enough. But I don’t see the examples as being particularly impossible
to parse in LL(1).

Mikkel

···

At 06:29 AM 9/29/2002 +0900, MikkelFJ wrote: