Ruby Parser Combinator

Hello all,

this is my first post on the ruby-talk list. I've been using ruby for a while though :slight_smile:

Anyway, just horsing around I tried my hand at writing a ruby parser combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

(the explanation and link to the gem and source tar are most of the way down the page (the post with the catapult in it)).

Anyway I'm still a bit of a noob when it comes to fp and parsing stuffs so if anyone wants to correct me I'd be grateful for the help :slight_smile:

Cheers,
Scott

Scott Weeks <weeksie@twelvestone.com> writes:

Hello all,

this is my first post on the ruby-talk list. I've been using ruby for
a while though :slight_smile:

Anyway, just horsing around I tried my hand at writing a ruby parser
combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

(the explanation and link to the gem and source tar are most of the
way down the page (the post with the catapult in it)).

Anyway I'm still a bit of a noob when it comes to fp and parsing
stuffs so if anyone wants to correct me I'd be grateful for the help
:slight_smile:

You may be interested in
http://chneukirchen.org/blog/archive/2005/10/the-rightyear-parsing-library.html

路路路

Cheers,
Scott

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Minor correction your example calculator has a prefix syntax, not an infix syntax. Otherwise I likes dem der combinators. and cor lets you do scanner forking, which is always fun. Neat.

路路路

On Jan 20, 2006, at 2:59 AM, Scott Weeks wrote:

Anyway, just horsing around I tried my hand at writing a ruby parser combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

Take a look here:

http://rubyforge.org/projects/grammar/

I think this is the same concept that you are talking about. I just
took it to the next level.

Sorry I haven't been working on this for a while. I just haven't had
time with a new job where I'm working too much. I have lots of
updates, but haven't had time to get back to it.

路路路

On 1/20/06, Scott Weeks <weeksie@twelvestone.com> wrote:

Hello all,

this is my first post on the ruby-talk list. I've been using ruby for a
while though :slight_smile:

Anyway, just horsing around I tried my hand at writing a ruby parser
combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

(the explanation and link to the gem and source tar are most of the way
down the page (the post with the catapult in it)).

Anyway I'm still a bit of a noob when it comes to fp and parsing stuffs
so if anyone wants to correct me I'd be grateful for the help :slight_smile:

Cheers,
Scott

You may be interested in
leah blogs: The Rightyear parsing library

Cool! That sure is a lot simpler than mine. Thanks for the link :slight_smile:

ha! thanks for the correction, I have a tendency to do stuff like that. I'll update the examples/code :slight_smile:

Cheers

路路路

On 21/01/2006, at 6:49 AM, Logan Capaldo wrote:

Minor correction your example calculator has a prefix syntax, not an infix syntax. Otherwise I likes dem der combinators. and cor lets you do scanner forking, which is always fun. Neat.

Take a look here:

http://rubyforge.org/projects/grammar/

Nice! It's a bit different since it seems to be a classic style parser library and not quite in the same vein but still very cool. I especially like how you overrode | and + to make expressing production rules more natural

...

Hey Christian, I couldn't find the code for your parser combinator in that blog entry, could you post a link? I'd love to look at your implementation because it seems like you've done it in a more elegant way (at least from the examples you gave).

Cheers,
Scott

路路路

On 21/01/2006, at 3:01 AM, Christian Neukirchen wrote:

Scott Weeks <weeksie@twelvestone.com> writes:

Hello all,

this is my first post on the ruby-talk list. I've been using ruby for
a while though :slight_smile:

Anyway, just horsing around I tried my hand at writing a ruby parser
combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

(the explanation and link to the gem and source tar are most of the
way down the page (the post with the catapult in it)).

Anyway I'm still a bit of a noob when it comes to fp and parsing
stuffs so if anyone wants to correct me I'd be grateful for the help
:slight_smile:

You may be interested in
leah blogs: The Rightyear parsing library

Cheers,
Scott

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Scott Weeks <weeksie@twelvestone.com> writes:

Hey Christian, I couldn't find the code for your parser combinator in
that blog entry, could you post a link? I'd love to look at your
implementation because it seems like you've done it in a more elegant
way (at least from the examples you gave).

I attached all I have. It's not really practical because flow-control
with throw/catch isn't as fast as I wished... but it could be
interesting nevertheless.

rightyear.tar.gz (4.11 KB)

路路路

Cheers,
Scott

I think what I did is closer to what your blog talks about than
classic parsers. Basically, you start with leaf "Grammar" objects and
build complex Grammar productions by combining Grammars.

If want to see the very basics of this technique, here is a stripped
down version of it with a calculator example (self contained and
executable):

class SimpleGrammar
    def initialize(grammar)
        @grammar = grammar
    end
    def scan(cursor,buffer)
        @grammar.scan(cursor,buffer)
    end
    def |(other); Code.new { |cursor,buffer|
        scan(cursor,buffer) || other.scan(cursor,buffer)
    } end
    def +(other); Code.new { |cursor,buffer|
        scan(cursor,buffer) && other.scan(cursor,buffer)
    } end
    def filter(buf0,&block); Code.new { |cursor,buffer|
        buf = buf0.clone
        scan(cursor,buf) && buffer.concat(block[buf])
    } end
    def discard; Code.new { |cursor,buffer|
        scan(cursor,)
    } end
    class Code < SimpleGrammar
        def initialize(&block)
            @block = block
        end
        def scan(cursor,buffer)
            @block[cursor,buffer]
        end
    end
    class Recurse < SimpleGrammar
        def initialize(&block)
            @grammar = block[self]
        end
        def scan(cursor,buffer)
            @grammar.scan(cursor,buffer)
        end
    end
    class Element < SimpleGrammar
        def initialize(pattern)
            @pattern = pattern
        end
        def scan(cursor,buffer)
            c = cursor.read1after
            if @pattern===c
                buffer << c
                cursor.skip1next
                true
            end
        end
    end
    # Make methods for our classes that call new for us
    constants.each { |klass|
        eval("
            def #{klass}(*args,&block)
                #{klass}.new(*args,&block)
            end
            def self.#{klass}(*args,&block)
                #{klass}.new(*args,&block)
            end
        ")
    }
    NULL = Code.new { true }
end

class IO
    # implement just the methods we need to look like a cursor
    def read1after;c=getc;ungetc(c);c;end
    def skip1next;getc&&true;end
end

class Expression < SimpleGrammar::Recurse
    def initialize; super() { |expr|
        digit = Element(?0..?9)
        int = Recurse { |int| digit+(int|NULL) }
        number =
            (int + (
                Element(?.)+int |
                NULL
            )).filter("") { |n| [n.to_f] }
        primary = Recurse { |primary|
            number |
            Element(?-).discard + primary + Code { |_,b| b[-1]=-b[-1] } |
            Element(?().discard + expr + Element(?)).discard
        }
        product = Recurse { |product|
            primary + (
                Element(?*).discard + product + Code { |_,b|
b[-2]*=b[-1];b.pop } |
                Element(?/).discard + product + Code { |_,b|
b[-2]/=b[-1];b.pop } |
                NULL
            )
        }
        sum = Recurse { |sum|
            product + (
                Element(?+).discard + sum + Code { |_,b| b[-2]+=b[-1];b.pop } |
                Element(?-).discard + sum + Code { |_,b| b[-2]-=b[-1];b.pop } |
                NULL
            )
        }
    } end
end

Expression.new.scan(STDIN,buf=) && p(buf[0])

In the grammar package on rubyforge, the API is very similar to above,
but I added a bunch of stuff to improve performance, usability, and
features. Notice in the above, I'm not really tied to using a
character stream (IO). You could be parsing a token stream or
whatever. You just need to implement a subset of the Cursor (another
package of mine) API. This way you can build lexers, parsers,
preprocessors, etc in the same way.

路路路

On 1/21/06, Scott Weeks <weeksie@twelvestone.com> wrote:

>
> Take a look here:
>
> http://rubyforge.org/projects/grammar/
>

Nice! It's a bit different since it seems to be a classic style parser
library and not quite in the same vein but still very cool. I
especially like how you overrode | and + to make expressing production
rules more natural

Cool! Thanks for that, I love reading other people's code :slight_smile:

I'm really more interested in this as an exercise to get a better understanding of functional programming techniques by seeing how they could/can be implemented in Ruby. I love this sort of stuff.

路路路

On 22/01/2006, at 1:42 AM, Christian Neukirchen wrote:

Scott Weeks <weeksie@twelvestone.com> writes:

Hey Christian, I couldn't find the code for your parser combinator in
that blog entry, could you post a link? I'd love to look at your
implementation because it seems like you've done it in a more elegant
way (at least from the examples you gave).

I attached all I have. It's not really practical because flow-control
with throw/catch isn't as fast as I wished... but it could be
interesting nevertheless.

Cheers,
Scott

<rightyear.tar.gz>--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Cool, Thanks for the examples! One of the things I was thinking about was bringing the example I wrote further inline with Parsec where it can handle a list of tokens of any sort rather than just an IO stream as you've done with the Cursor library.

My main aim with this was to understand combinators more intimately and I feel that it's been quite good for doing that and has certainly given me a lot of perspective in regards to a Haskell project that I'm working on.

Cheers,
Scott

路路路

On 22/01/2006, at 11:50 AM, Eric Mahurin wrote:

On 1/21/06, Scott Weeks <weeksie@twelvestone.com> wrote:

Take a look here:

http://rubyforge.org/projects/grammar/

Nice! It's a bit different since it seems to be a classic style parser
library and not quite in the same vein but still very cool. I
especially like how you overrode | and + to make expressing production
rules more natural

I think what I did is closer to what your blog talks about than
classic parsers. Basically, you start with leaf "Grammar" objects and
build complex Grammar productions by combining Grammars.

If want to see the very basics of this technique, here is a stripped
down version of it with a calculator example (self contained and
executable):

class SimpleGrammar
    def initialize(grammar)
        @grammar = grammar
    end
    def scan(cursor,buffer)
        @grammar.scan(cursor,buffer)
    end
    def |(other); Code.new { |cursor,buffer|
        scan(cursor,buffer) || other.scan(cursor,buffer)
    } end
    def +(other); Code.new { |cursor,buffer|
        scan(cursor,buffer) && other.scan(cursor,buffer)
    } end
    def filter(buf0,&block); Code.new { |cursor,buffer|
        buf = buf0.clone
        scan(cursor,buf) && buffer.concat(block[buf])
    } end
    def discard; Code.new { |cursor,buffer|
        scan(cursor,)
    } end
    class Code < SimpleGrammar
        def initialize(&block)
            @block = block
        end
        def scan(cursor,buffer)
            @block[cursor,buffer]
        end
    end
    class Recurse < SimpleGrammar
        def initialize(&block)
            @grammar = block[self]
        end
        def scan(cursor,buffer)
            @grammar.scan(cursor,buffer)
        end
    end
    class Element < SimpleGrammar
        def initialize(pattern)
            @pattern = pattern
        end
        def scan(cursor,buffer)
            c = cursor.read1after
            if @pattern===c
                buffer << c
                cursor.skip1next
                true
            end
        end
    end
    # Make methods for our classes that call new for us
    constants.each { |klass|
        eval("
            def #{klass}(*args,&block)
                #{klass}.new(*args,&block)
            end
            def self.#{klass}(*args,&block)
                #{klass}.new(*args,&block)
            end
        ")
    }
    NULL = Code.new { true }
end

class IO
    # implement just the methods we need to look like a cursor
    def read1after;c=getc;ungetc(c);c;end
    def skip1next;getc&&true;end
end

class Expression < SimpleGrammar::Recurse
    def initialize; super() { |expr|
        digit = Element(?0..?9)
        int = Recurse { |int| digit+(int|NULL) }
        number =
            (int + (
                Element(?.)+int |
                NULL
            )).filter("") { |n| [n.to_f] }
        primary = Recurse { |primary|
            number |
            Element(?-).discard + primary + Code { |_,b| b[-1]=-b[-1] } |
            Element(?().discard + expr + Element(?)).discard
        }
        product = Recurse { |product|
            primary + (
                Element(?*).discard + product + Code { |_,b|
b[-2]*=b[-1];b.pop } |
                Element(?/).discard + product + Code { |_,b|
b[-2]/=b[-1];b.pop } |
                NULL
            )
        }
        sum = Recurse { |sum|
            product + (
                Element(?+).discard + sum + Code { |_,b| b[-2]+=b[-1];b.pop } |
                Element(?-).discard + sum + Code { |_,b| b[-2]-=b[-1];b.pop } |
                NULL
            )
        }
    } end
end

Expression.new.scan(STDIN,buf=) && p(buf[0])

In the grammar package on rubyforge, the API is very similar to above,
but I added a bunch of stuff to improve performance, usability, and
features. Notice in the above, I'm not really tied to using a
character stream (IO). You could be parsing a token stream or
whatever. You just need to implement a subset of the Cursor (another
package of mine) API. This way you can build lexers, parsers,
preprocessors, etc in the same way.