When little languages grow

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

Other possibilities I have considered and tried are to lash together
some form of Lisp [cf Greenspun's 10th rule of programming] or Forth,
but I don't consider myself fluent in either of those languages, and
they are not as easy a user interface for other people as Ruby would
be. I can get something working, but find it hard to maintain or
improve. [2]

So the next possibility is to use something like

         input = nil
         File.open("input.txt"){|f|
           input = f.read
         }
         Thread.new(input){|source|
           $SAFE=5
           instance_eval source
         }.value

or something, and actually make the commands in the language methods
of some Ruby object.

It is often observed that it is difficult to add security to a
system, compared to building it in from the start. Can I do this
and still have a good level of security? Should I make the parser
object (whose method's I'm using) a subclass of Nil, to limit it as
much as possible? I need to give people enough rope to hold their
input together, but not enough to hang themselves (or me). I don't
want people to be able to execute arbitrary code, or fiddle with
objects they should not need to touch.

Is there another way to handle input flexibly that I have completely
missed? I've googled for things to do with little languages and parsing, but have found nothing enlightening.

         Thank you,
         Hugh

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

[2] Immensely powerful and fast systems have been written in Forth,
and Lisp is very powerful in the right hands. I just don't have the
experience with these to be effective, yet.

To make sure myself and the rest of the list is correctly hearing you question. You are interested in the ruby way to write a code generator?

And you are looking for input other parsing or implemented solutions others may have experience with, with languges and tools such as; lisp, forth, yacc, bison, racc, etc.. ?

Zach

Hugh Sasse Staff Elec Eng wrote:

···

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

Other possibilities I have considered and tried are to lash together
some form of Lisp [cf Greenspun's 10th rule of programming] or Forth,
but I don't consider myself fluent in either of those languages, and
they are not as easy a user interface for other people as Ruby would
be. I can get something working, but find it hard to maintain or
improve. [2]

So the next possibility is to use something like

        input = nil
        File.open("input.txt"){|f|
          input = f.read
        }
        Thread.new(input){|source|
          $SAFE=5
          instance_eval source
        }.value

or something, and actually make the commands in the language methods
of some Ruby object.

It is often observed that it is difficult to add security to a
system, compared to building it in from the start. Can I do this
and still have a good level of security? Should I make the parser
object (whose method's I'm using) a subclass of Nil, to limit it as
much as possible? I need to give people enough rope to hold their
input together, but not enough to hang themselves (or me). I don't
want people to be able to execute arbitrary code, or fiddle with
objects they should not need to touch.

Is there another way to handle input flexibly that I have completely
missed? I've googled for things to do with little languages and parsing, but have found nothing enlightening.

        Thank you,
        Hugh

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

[2] Immensely powerful and fast systems have been written in Forth,
and Lisp is very powerful in the right hands. I just don't have the
experience with these to be effective, yet.

         Thread.new(input){|source|
           $SAFE=5
           instance_eval source
         }.value

Sorry to say this but this is the most common error that I see when
someone try to eval some code with $SAFE >= 4

The code will be eval'ed with $SAFE >= 4 but the result (#value) will be
used with $SAFE = 0 and you can have problems.

The result of #eval must be cleaned with $SAFE >= 4, before it's
returned.

Guy Decoux

Hugh,

the one thing I didn't see in your posting is a statement about the language. What capabilities should it have? If it's just assigning constants to vars (like often needed for configurations) then Regexp is probably fine. From what you write I'm guessing that your envisioned language is more complex - but how complex? Maybe it's a special case for which someone somewhere has a solution already.

Regards

    robert

"Hugh Sasse Staff Elec Eng" <hgs@dmu.ac.uk> schrieb im Newsbeitrag news:Pine.GSO.4.60.0501261622400.24999@brains.eng.cse.dmu.ac.uk...

···

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

Other possibilities I have considered and tried are to lash together
some form of Lisp [cf Greenspun's 10th rule of programming] or Forth,
but I don't consider myself fluent in either of those languages, and
they are not as easy a user interface for other people as Ruby would
be. I can get something working, but find it hard to maintain or
improve. [2]

So the next possibility is to use something like

        input = nil
        File.open("input.txt"){|f|
          input = f.read
        }
        Thread.new(input){|source|
          $SAFE=5
          instance_eval source
        }.value

or something, and actually make the commands in the language methods
of some Ruby object.

It is often observed that it is difficult to add security to a
system, compared to building it in from the start. Can I do this
and still have a good level of security? Should I make the parser
object (whose method's I'm using) a subclass of Nil, to limit it as
much as possible? I need to give people enough rope to hold their
input together, but not enough to hang themselves (or me). I don't
want people to be able to execute arbitrary code, or fiddle with
objects they should not need to touch.

Is there another way to handle input flexibly that I have completely
missed? I've googled for things to do with little languages and parsing, but have found nothing enlightening.

        Thank you,
        Hugh

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

[2] Immensely powerful and fast systems have been written in Forth,
and Lisp is very powerful in the right hands. I just don't have the
experience with these to be effective, yet.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

...

So the next possibility is to use something like

        input = nil
        File.open("input.txt"){|f|
          input = f.read
        }
        Thread.new(input){|source|
          $SAFE=5
          instance_eval source
        }.value

or something, and actually make the commands in the language methods
of some Ruby object.

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
offered under chroot.

_why

···

Hugh Sasse Staff Elec Eng (hgs@dmu.ac.uk) wrote:
from ::Object, but would only include a limited set of modules when

Hi ..

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

My personal solution to this is to use Coco/R, an LL(1) scanner/generator.
You can find more information at:

  http://www.scifac.ru.ac.za/coco

The primary advantage of this approach, IMHO, is that all of the grammar /
scanning rules are in a single file (rather than the lex/yacc approach).
This makes the grammar quite easy to read and extend, once you are familiar
with the process. Ryan Davies has a pure ruby version, and I have a ruby
extension version. Both seem to work well for little languages.

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. ... Maybe there is something I can read which
will turn the problem around, so it becomes easy to handle?

Pat Terry has a book "Compilers and Compiler Generators" that covers LL(1)
(and other) topics very well. You can find it at:

  http://www.scifac.ru.ac.za/compilers/

The primary disadvantage of Coco/R is the LL(1) part. This means that your
grammar needs to be fairly well formed and not arbitrarily complex. As an
example, Ruby can not, as far as I have tried, be converted into an LL(1)
grammar, though C can.

A simple example of the ruby grammar (this is for the famous four function
calculator) for my extension library. Note that this will generate a Ruby
extension. When you compile and link, you can use it in Ruby like this:

# ---( test.rb )-------------
require 'Calc'

f = File.readlines("calc.inp")
t = Calc.new
t.run(f)

if t.success
   puts "parsed ok!"
   t.capture.each { |ans| puts " ans==#{ans}" }
else
   puts "Errors ::"
   t.errs.each { |err| puts " --> #{err}" }
end

# ---( calc.inp )-----------
var a,b,c,d;

write 1+(2*3)+4;
write 100/10;

write a;
b := a*16;
write b*2

# ---( calc.atg )-----------
$C /* Generate Main Module */
COMPILER Calc

#define upcase(c) ((c >= 'a' && c <= 'z')? c-32:c)
int VARS[10000];

int get_spix()
{
  char name[20];
  LEX_S(name, sizeof(name) - 1);
  if (strlen(name) >= 2)
    return 26*(upcase(name[1])-'A')+(upcase(name[0])-'A');
  else
    return (upcase(name[0])-'A');
}

int get_number()
{
  char name[20];
  LEX_S(name, sizeof(name) - 1);
  return atoi(name);
}

void new_var(int spix)
{
  VARS[spix] = 0;
}

int get_var(int spix)
{
  return VARS[spix];
}

void write_val(int val)
{
  char tmp[20];

  sprintf(tmp, "%d", val);
  t_capture_output(tmp);
}

void set_var(int spix, int val)
{
  VARS[spix] = val;
}

IGNORE CASE

CHARACTERS
  letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
  digit = "0123456789".
  eol = CHR(13) .
  lf = CHR(10) .

COMMENTS
  FROM '--' TO eol

IGNORE eol + lf

TOKENS
  ident = letter {letter | digit} .
  number = digit {digit} .

PRODUCTIONS
  Calc =
    [Declarations] StatSeq .

  Declarations
    = (. int spix; .)
       'VAR'
       Ident <&spix> (. new_var(spix); .)
       { ',' Ident <&spix> (. new_var(spix); .)
       } ';'.

  StatSeq =
    Stat {';' Stat}.

  Stat
    = (. int spix, val; .)
      > "WRITE" Expr <&val> (. write_val(val); .)
      > Ident <&spix> ":=" Expr <&val> (. set_var(spix, val); .) .

  Expr <int *exprVal>
    = (. int termVal; .)
      Term <exprVal>
      { '+' Term <&termVal> (. *exprVal += termVal; .)
      > '-' Term <&termVal> (. *exprVal -= termVal; .)
      } .

  Term <int *termVal>
    = (. int factVal; .)
      Fact <termVal>
      { '*' Fact <&factVal> (. *termVal *= factVal; .)
      > '/' Fact <&factVal> (. *termVal /= factVal; .)
      } .

  Fact <int *factVal>
    = (. int spix; .)
         Ident <&spix> (. *factVal = get_var(spix); .)
      > number (. *factVal = get_number(); .)
      > '(' Expr <factVal> ')' .

  Ident <int *spix>
    = ident (. *spix = get_spix(); .) .

END Calc.

I hope that this helps.

Regards,

···

On Wednesday 26 January 2005 09:08, Hugh Sasse Staff Elec Eng wrote:
a := 37-12-(4*5);

--
-mark. (probertm at acm dot org)

Perhaps you should take a look at Lua:

http://www.lua.org/about.html

It started life as a "data entry language" of sort:

http://www.lua.org/history.html

Then read the book:

http://www.lua.org/pil/

Cheers

···

On Jan 26, 2005, at 18:08, Hugh Sasse Staff Elec Eng wrote:

I've googled for things to do with little languages and parsing, but have found nothing enlightening.

--
PA
http://alt.textdrive.com/

Hi Hugh,

What are you trying to parse exactly? Is it full Ruby source or a small
subset?

If full source, have you considered what it is exactly that might be
unsafe? Maybe there's a way to create you own "Safety Net" by making
certain parts of Ruby inaccessable. Not sure how extactly, seems like
namepsaces would be needed, but it would be an interesting challange.
Also you might try ParseTree.

OTOH, if only a subset (or something completey different), I have a
general purpose and easy to use Parser class I've been working on for
Carats. Perhaps you'd like to try it and see if can help? Doing so
could also help me test/improve it for everyone.

T.

Quoteing hgs@dmu.ac.uk, on Thu, Jan 27, 2005 at 02:08:04AM +0900:

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

snip

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

Hi Hugh.

Two suggestions:

Most of the docs for parser-generators assume some (lots?) knowledge of
compiler theory. An exception is the O'Reilly Lex&Yacc book. If any of
the generators you mention above is anything like Yacc, I think you
might like the book. Its at a practical level, it assumes you are a
competent programmer, you just haven't taken a compiler course. The
first few chapters got me up to speed very quickly (I think you develop
a calculator). After that, the example used is generating a
mini-language to specify curses screen layouts. I don't do any curses
stuff, but the examples were used to good effect.

I think that if you do this often, mastering one of these tools will be
a life skill you'll never regret! And yacc (or one of the others) is a
mini-language in its own right, always good to learn another.

On another track, I have deliberately NOT used these grammers when
parseing the BNF for internet email. In my experience, if you can write
the grammer down in BNF, its usually pretty easy to write a recursive
descent parser for it. It's a little tedious though, which is why all
the tools exist to generate the code for you...

Have fun,
Sam

(In response to news:Pine.GSO.4.60.0501261622400.24999
@brains.eng.cse.dmu.ac.uk by Hugh Sasse Staff Elec Eng)

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

Just a few random thoughts:

- All of this depends on how complex your mini-language is. Should it be
Turing complete or just really something like fstab ? See
http://www.faqs.org/docs/artu/ch08s01.html (a chapter from the Art of
Unix Programming, including a taxonomy of languages).

- If your language needs loopiness and control constructs like Ruby has
(or should look like Ruby), then you could help me think about a cut-down
version of Ruby for configuration files: This version of Ruby should be
accessible from within Ruby and be totally safe (=cut off of the external
world). Such a thing would have to be execution timeouted (endless loops
and stack overflows) too...

- Your problem was essentially the topic of one of my semester projects.
I tried out a lot of different parser generators, for different
languages. None really could score in the category of simplicity.
Probably just a complex subject. But the problem of having to create a
minilanguage pops up all the time, as you say.

- Parse error reporting: A lot of programming languages have bad parse
error reporting, which essentially comes from the fact that the parser
generator does not help you in this. ANTLR is a bit better there, but IMO
still sucks.

I think there are two really good ideas in this thread:
a) Create and Integrate Grammars as first class objects in Ruby.
(Clifford Heath)
b) Use a cut-down version of Ruby that is designed with security in mind
(various people, this has been done).

I can't help feeling that we're missing a really simple way to do all of
this ...

kaspar

hand manufactured code - www.tua.ch/ruby

To make sure myself and the rest of the list is correctly hearing you question. You are interested in the ruby way to write a code generator?

There's probably more than one way, but yes. I don't need to
generate an executable for later use, so interpreting my input is
fine. I need to manage the complexity so I can cope with future
expansion if any.

And you are looking for input other parsing or implemented solutions others may have experience with, with languges and tools such as; lisp, forth, yacc, bison, racc, etc.. ?

I'm looking to do this in Ruby. Experience of things that simplify
this, whether they come from other languages or not, is what I am
after. I mention the other languages because I have tried their
styles of handling this problem. My success has been limited. So,
what can I do for more success? :slight_smile: Look at the problem differntly?
Use another technique?

Zach

         Hope that is clearer,
         Thank you,
         Hugh

···

On Thu, 27 Jan 2005, Zach Dennis wrote:

> Thread.new(input){|source|
> $SAFE=5
> instance_eval source
> }.value

Sorry to say this but this is the most common error that I see when
someone try to eval some code with $SAFE >= 4

The code will be eval'ed with $SAFE >= 4 but the result (#value) will be
used with $SAFE = 0 and you can have problems.

Yes, that's a good point.

The result of #eval must be cleaned with $SAFE >= 4, before it's
returned.

Thank you. That would be better than cleaning it aftwerwards, I'd
not really considered that risk.

Guy Decoux

         Hugh

···

On Thu, 27 Jan 2005, ts wrote:

Hugh,

the one thing I didn't see in your posting is a statement about the language. What capabilities should it have? If it's just assigning constants to vars

I was trying to keep this general because I run into the problem of
parsing non-simplistic grammars so often. It's easy to do the
  <verb> <direct object> type grammars with lots of verbs[1], but...

My present example is that I want to parse Constructive Solid
Geometry descriptions, at the moment limited to cones, spheres,
bricks with the co-ordinates specified, and I need to specify
material types as well.

I'd also like to be able to declare new objects so they can be
placed. Silly example: Get two small spheres to cap the ends of
a cylinder and call the result a Sausage. Then place several
Sausages in the space at different points.

Later I'd have to extend the language to be able to rotate them into
psoition. Lots of creeping featurism is likely, I suspect. I
didn't have material types to deal with before.

(like often needed for configurations) then Regexp is probably fine. From

Agreed

what you write I'm guessing that your envisioned language is more complex - but how complex? Maybe it's a special case for which someone somewhere has a solution already.

Regards

  robert

         Thank you
         Hugh

[1] some years back I got Arthur Secret's Agora (Perl, web by email)
program working and extended it considerably. I used regexps for
that.

···

On Thu, 27 Jan 2005, Robert Klemme wrote:

[...]

So the next possibility is to use something like

        input = nil
        File.open("input.txt"){|f|
          input = f.read
        }
        Thread.new(input){|source|
          $SAFE=5
          instance_eval source
        }.value

or something, and actually make the commands in the language methods
of some Ruby object.

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
from ::Object, but would only include a limited set of modules when
offered under chroot.

That's a nice metaphor...

_why

         [I'll have to explore that RedHanded site a bit more too1
         :-)]

         Hugh

···

On Thu, 27 Jan 2005, why the lucky stiff wrote:

Hugh Sasse Staff Elec Eng (hgs@dmu.ac.uk) wrote:

I've googled for things to do with little languages and parsing, but have found nothing enlightening.

Perhaps you should take a look at Lua:

I really like Lua, (bought the book), but I need to keep this as pure Ruby.
Given time, energy, etc I'd love to provide Lua tables to Ruby.... I
don't think they'd solve my problem here though.

--
PA
http://alt.textdrive.com/

         Thank you,
         Hugh

···

On Thu, 27 Jan 2005, PA wrote:

On Jan 26, 2005, at 18:08, Hugh Sasse Staff Elec Eng wrote:

Hi ..

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

My personal solution to this is to use Coco/R, an LL(1) scanner/generator.
You can find more information at:

http://www.scifac.ru.ac.za/coco

Thank you. I'd seen this on RAA but not explored it...

The primary advantage of this approach, IMHO, is that all of the grammar /
scanning rules are in a single file (rather than the lex/yacc approach).
This makes the grammar quite easy to read and extend, once you are familiar
with the process. Ryan Davies has a pure ruby version, and I have a ruby
extension version. Both seem to work well for little languages.

I'll probably stay with the pure ruby one, but I'll certainly look
at this. I suspect that the 1 might be the problem for what I'm
trying to do. I can't remember how LL differs from LR now, but I'll
find that pretty easily.

         [...]

Pat Terry has a book "Compilers and Compiler Generators" that covers LL(1)
(and other) topics very well. You can find it at:

Missing cookie

         Thank you.
         Hugh

···

On Thu, 27 Jan 2005, Mark Probert wrote:

On Wednesday 26 January 2005 09:08, Hugh Sasse Staff Elec Eng wrote:

In article <20050126175623.GB35232@topi.cc>,

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

...

So the next possibility is to use something like

        input = nil
        File.open("input.txt"){|f|
          input = f.read
        }
        Thread.new(input){|source|
          $SAFE=5
          instance_eval source
        }.value

or something, and actually make the commands in the language methods
of some Ruby object.

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
from ::Object, but would only include a limited set of modules when
offered under chroot.

_ This is the one place that Tcl ( at least circa 1994 Tcl )
absolutely fits like a glove. There was a client/server program
called sysctl written by some guys at IBM that allowed you to
assign ACL's to every command in the language. I've never seen
any other secure distributed scripting system that comes close.

_ Of course it was vast overkill for 99% of what you need to
do with that kind of system. As much as I dislike Tcl for other
reasons, it is a very good language for extending applications
via simple scripting. Adding your own specialized commands is
fairly straightforward. I know quite a few scientific groups
that are using Python to do similar things these days.

_ Booker C. Bense

···

why the lucky stiff <ruby-talk@whytheluckystiff.net> wrote:

Hugh Sasse Staff Elec Eng (hgs@dmu.ac.uk) wrote:

Hi Hugh,

What are you trying to parse exactly? Is it full Ruby source or a small
subset?

Not full ruby source, just a mini-language, maybe using Ruby's
parser instead of botching(!) my own...

If full source, have you considered what it is exactly that might be

I don't know what would be unsafe yet: I'm assuming that there are
some smart people out there who like doing heap overruns and other
feindish tricks that I'd not envisage.... So, I'd like to be safe
by default rather than be safe for the cases I have considered.

unsafe? Maybe there's a way to create you own "Safety Net" by making
certain parts of Ruby inaccessable. Not sure how extactly, seems like
namepsaces would be needed, but it would be an interesting challange.
Also you might try ParseTree.

I'll have a look for that, thanks.

OTOH, if only a subset (or something completey different), I have a
general purpose and easy to use Parser class I've been working on for
Carats. Perhaps you'd like to try it and see if can help? Doing so
could also help me test/improve it for everyone.

OK, bowl the URL in my direction! Thank you.

T.

         Hugh

···

On Thu, 27 Jan 2005, Trans wrote:

I hear ya. I wish there was a way to open a jailed namespace. It'd

be

like chroot'ing into a module. The sandbox module would be

addressable

from ::Object, but would only include a limited set of modules when
offered under chroot.

Sounds like capability security, where - to super-simplify - you can
only access the objects/functionality you can name. It's used by the E
language - see a good description at

http://www.skyhunter.com/marcs/ewalnut.html#SEC42

My guess though is that Ruby is too 'sloppy' for such a thing to work
fully securely. I'd be very happy to be proved wrong!

-- George

Mark Probert wrote:

My personal solution to this is to use Coco/R, an LL(1) scanner/generator.

I haven't used Coco, but I'd second Mark's recommendation to stay with
an LL(1) parser if possible. If not, then LL(n) ala ANTLR, but not LALR
ala yacc. The LL parsers are easy to write manually using recursive
descent, which I've done a few times.

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. ... Maybe there is something I can read which will turn the problem around, so it becomes easy to handle?

Shift just means "delay a decision about what I've just seen". Reduce is
the operation you do when you do decide. If you explore the ambiguity
in your grammar rules, these start to make more sense.

The primary disadvantage of Coco/R is the LL(1) part.

ANTLR does LL(n) for arbitrary n I believe - though you should avoid
n > 3 or humans start to have trouble parsing your language :-).

It's a shame that the ANTLR folk at Purdue went Java-only when they
dropped their old C-based implementation. A multi-lingual ANTLR would
be super-cool, especially if it would generate Ruby.

As an example, Ruby can not, as far as I have tried, be converted into an LL(1) grammar, though C can.

Not without a tie-in to the lexical analyser to help recognise goto
labels, which require LL(2). Such a tie-in is commonly used however.

A simple example of the ruby grammar

Good example, thanks Mark.

I should point out that the major reason for the success of XML
(contrary to most of the hyped claims about it) is that it allows
people to create languages without having to create parsers. Or
rather, they use an XML parser which yields a DOM, and can process
the AST at will.

If you can live with the ugliness of XML and the size&speed of Rexml,
you should consider it.

There's no good reason why a language like Ruby shouldn't have
grammar rules as first-class objects (as Regexp's are), yielding
Ruby objects that reflect the AST, allowing attribute-grammar
parsers to be written and integrated directly within a program.

Such a tool, integrated into the Ruby interpreter itself, would
allow extension modules to define *Ruby syntax extensions*, so
that the language itself becomes plastic.

I haven't thought much about what these last two features would
look like in Ruby's case.

Clifford Heath.