Parsing C++ with Ruby

Context:

My company uses a Directed Acyclic Graphs implementation of the
Observer/Notifier pattern. I would love to see automatic generation of
diagrams representing those relationships.

Challenge:

It seems to me the easiest way to attain my goal is to generate a dot
file and have www.graphviz.org do the rest. I imagine the dot-file would
have to be generated by parsing C++ (yuck!).

How do I go about doing that ?

Do I simply parse the code looking for a multi-line patterns ? (See
example below.) Or do I use existent parsers to generate a full-blown
semantic tree ? (And then what ?)

Looking forward to your experiences/ideas,
Simon

Sample code:

The code I want to transform in dot edges looks like:

singleDirection (
componentA->port(“out”),
componentB->port(“input”)
);

doubleDirection ( componentB->port(“setting1”), componentD->port(“setting1”));

···


There are 10 types of people in the world…
those who understand binary and those who don’t.

A recent Dr. Dobbs Journal described extensions made to gcc so that it
dumps the parsed c or c++ syntax as xml. They did this because existing
non-compiler parsers were all aproximations of the c++ syntax, and
didnt’ work with complex templated c++ code. They used it to build
wrappers in some other language (swig-like), but you could use this to
get an xml representation of the code, and then your dotfile generator
would only have to parse xml, for which there are many tools.

I think if you search for xml and gcc, you will find the home page of
the the gcc port (I think it will be merged, eventually).

Sam

Quoteing deliriousNOSPAM@atchoo.be, on Sat, Apr 26, 2003 at 03:01:58AM +0900:

···

Context:

My company uses a Directed Acyclic Graphs implementation of the
Observer/Notifier pattern. I would love to see automatic generation of
diagrams representing those relationships.

Challenge:

It seems to me the easiest way to attain my goal is to generate a dot
file and have www.graphviz.org do the rest. I imagine the dot-file would
have to be generated by parsing C++ (yuck!).

How do I go about doing that ?

Do I simply parse the code looking for a multi-line patterns ? (See
example below.) Or do I use existent parsers to generate a full-blown
semantic tree ? (And then what ?)

Looking forward to your experiences/ideas,
Simon

Sample code:

The code I want to transform in dot edges looks like:

singleDirection (
componentA->port(“out”),
componentB->port(“input”)
);

doubleDirection ( componentB->port(“setting1”), componentD->port(“setting1”));


There are 10 types of people in the world…
those who understand binary and those who don’t.

I had never heard about this, very interesting.

Am I to gather from your answer that you do not recommend trying to
match multi-line C++ expressions with Ruby.

Simon

···

On Sat, 26 Apr 2003 at 16:48 GMT, Sam Roberts wrote:

A recent Dr. Dobbs Journal described extensions made to gcc so that it
dumps the parsed c or c++ syntax as xml. They did this because existing
non-compiler parsers were all aproximations of the c++ syntax, and
didnt’ work with complex templated c++ code. They used it to build
wrappers in some other language (swig-like), but you could use this to
get an xml representation of the code, and then your dotfile generator
would only have to parse xml, for which there are many tools.


There are 10 types of people in the world…
those who understand binary and those who don’t.

Hi, Simon.

Simon Vandemoortele deliriousNOSPAM@atchoo.be did say …

It seems to me the easiest way to attain my goal is to generate a dot
file and have www.graphviz.org do the rest.

I know nothing about GraphViz, just looked at the site. It would
seem that there is a “companion” system, DOxygen

http://www.stack.nl/~dimitri/doxygen/index.html

that seems to parse C++ and produce output for GraphViz.
Have you tried it out?

… parsing C++ (yuck!).How do I go about doing that ?

Yuk indeed.

You are only going to look for certain patterns in the
C++ code, right? That is not so bad with a suitable
parser/lexer combination. You will have to write a
grammar that can handle the concepts you want to extract.

I can suggest the LL(1) Ruby scanner/generator that I
have written called CocoRb.

http://raa.ruby-lang.org/list.rhtml?name=coco-rb

You won’t be able to parse the complete C++ tree, though
you should be able to extract wht you want. And it is
pretty easy to use. There are some sample of scanning
C code and processing it that may be of interest.

Regards,

···


-mark.


Mark Probert probertm@NOSPAM_nortelnetworks.com
Nortel Networks ph. (613) 768-1082

All opinions expressed are my own and do not
reflect in any way those of Nortel Networks.

I am working on a package-installer for freebsd… I use DAG (digraphs)
for dependency tracking… I use graphviz for visual output :slight_smile:

http://sysinstall2.sourceforge.net/

in the CVS repository: /project_sysinstall2/core
see dag.rb + depends.rb

I cannot give you a direct url to the source - Sourceforge-cvs is down
for the moment, so I cannot work :frowning:

···

On Fri, 25 Apr 2003 18:52:06 +0000, Simon Vandemoortele wrote:

My company uses a Directed Acyclic Graphs implementation of the
Observer/Notifier pattern. I would love to see automatic generation of
diagrams representing those relationships.


Simon Strandgaard

Quoteing deliriousNOSPAM@atchoo.be, on Sun, Apr 27, 2003 at 07:52:39PM +0900:

A recent Dr. Dobbs Journal described extensions made to gcc so that it
dumps the parsed c or c++ syntax as xml. They did this because existing
non-compiler parsers were all aproximations of the c++ syntax, and
didnt’ work with complex templated c++ code. They used it to build
wrappers in some other language (swig-like), but you could use this to
get an xml representation of the code, and then your dotfile generator
would only have to parse xml, for which there are many tools.

I had never heard about this, very interesting.

Am I to gather from your answer that you do not recommend trying to
match multi-line C++ expressions with Ruby.

I haven’t coded in C++ in years, but I tried to write a parser for it
once, and things like this are hard to parse:

template<class T = Hash<Obj > > Foo(T t, int max = 19);

Even if it wasn’t hard, there’s no way it would be as easy to parse
as xml, where there are lots of mature toolkits. Say it “only” takes
you 10 hours… thats 10 hours you could have spent doing something
cool thats never been done before!

From gccxml.org:

Development tools that work with programming languages benefit from
their ability to understand the code with which they work at a
level comparable to a compiler. C++ has become a popular and
powerful language, but parsing it is a very challenging problem.
This has discouraged the development of tools meant to work
directly with the language.

There is one open-source C++ parser, the C++ front-end to GCC, which is
currently able to deal with the language in its entirety. The purpose
of the GCC-XML extension is to generate an XML description of a C++
program from GCC’s internal representation. Since XML is easy to parse,
other development tools will be able to work with C++ programs without
the burden of a complicated C++ parser.

···

On Sat, 26 Apr 2003 at 16:48 GMT, Sam Roberts wrote:

Quoteing probertm@NOSPAM_acm.org, on Wed, May 07, 2003 at 04:51:12AM +0900:

Simon Vandemoortele deliriousNOSPAM@atchoo.be did say …

… parsing C++ (yuck!).How do I go about doing that ?

Don’t bother, let gcc do it for you and dump the parse tree as xml,
then just use your favorite xml tool to read it:

www.gccxml.org

Cheers,
Sam