Ruby docstrings

Hi all!

I’m currently engaged in RDoc’ing the stdlib as part of the ruby-doc
project. This has made me very much aware of how the dynamic nature
of Ruby makes a documentation system not integrated into the language
itself very difficult to develop, what with aliases, class << self,
instance_eval and module_eval etc… While Dave Thomas has been very
helpful with extending RDoc to handle various different combinations
of the above, this ultimately amounts to writing a second Ruby parser
just to parse out documentation–and there are limits to how far any
sane person is prepared to pursue this approach.

Why doesn’t Ruby adopt the docstring convention used in the Scheme and
Lisp worlds, and also by Python? For those unfamiliar with this, a
“docstring” is a standalone string constituting the first
statement/sexp of a function or (in Python) class definition. The
runtime generally has hooks to extract and display this documentation.
For instance, in guile I can do:

(define (foo-cons bar)
“Cons the string ‘foo’ onto the argument”
(cons “foo” bar))

and then in the interpreter:

(procedure-documentation foo-cons)
“Cons the string ‘foo’ onto the argument”

Then hooks could be added to the Ruby interpreter to parse, store and
extract docstrings. RDoc and other documentation systems would then
just have to load up any Ruby files they wanted to document inside the
Ruby interpreter, and use it to extract the documentation from classes
and methods; there would be no need to develop and maintain a separate
Ruby parser for documentation purposes. ri/rj and other command-line
documentation utilities could operate similarly, rather than having to
maintain a dump of documentation that is separate from the actual
libraries. IDEs and even irb would have easy built-in documentation
facilities. Best of all, the most exotic and inscrutable techniques
for defining new methods, currently inscrutable to RDoc and likely
always to remain so, would under this system still be capable of
producing documentation.

Is there a reason why Ruby hasn’t taken this approach? Would it be a
good approach to take?

William

William Webber wrote:
Best of all, the most exotic and inscrutable techniques

for defining new methods, currently inscrutable to RDoc and likely
always to remain so, would under this system still be capable of
producing documentation.

Is there a reason why Ruby hasn’t taken this approach? Would it be a
good approach to take?

But surely those method definitions would only be available if you
actually executed the code? Wouldn’t that be fairly dangerous?

Dave

A related issue is that if you use method_missing to dispatch method
calls, there’s almost no standard way for a parser to figure out what
sorts of methods you’re intending to define. I run into this a lot
because my pet project Lafcadio uses method_missing a lot to let
objects serve as facades for collections of subsystems, and I just
finished writing RDoc comments for everything. When it came down to
the methods handled through method_missing, I had to write class
comments because there’s no individual method definition for RDoc to
parse.

All of this is not to complain about RDoc, which is a fantastic tool.
In some ways, it’s nice to have this sort of problem. For a language
like Java you don’t run into this problem because reflection is so
bloody painful in that language. Ruby lets you be more flexible and
clever with method names, but the downside of that is that it raises
the bar for programs (like RDoc) whose functionality depends on
understanding that code.

Francis

William Webber wew@williamwebber.com wrote in message news:20030811020227.GA2043@elysium.domus.prv

···

Hi all!

I’m currently engaged in RDoc’ing the stdlib as part of the ruby-doc
project. This has made me very much aware of how the dynamic nature
of Ruby makes a documentation system not integrated into the language
itself very difficult to develop, what with aliases, class << self,
instance_eval and module_eval etc… While Dave Thomas has been very
helpful with extending RDoc to handle various different combinations
of the above, this ultimately amounts to writing a second Ruby parser
just to parse out documentation–and there are limits to how far any
sane person is prepared to pursue this approach.

Why doesn’t Ruby adopt the docstring convention used in the Scheme and
Lisp worlds, and also by Python? For those unfamiliar with this, a
“docstring” is a standalone string constituting the first
statement/sexp of a function or (in Python) class definition. The
runtime generally has hooks to extract and display this documentation.
For instance, in guile I can do:

(define (foo-cons bar)
“Cons the string ‘foo’ onto the argument”
(cons “foo” bar))

and then in the interpreter:

(procedure-documentation foo-cons)
“Cons the string ‘foo’ onto the argument”

Then hooks could be added to the Ruby interpreter to parse, store and
extract docstrings. RDoc and other documentation systems would then
just have to load up any Ruby files they wanted to document inside the
Ruby interpreter, and use it to extract the documentation from classes
and methods; there would be no need to develop and maintain a separate
Ruby parser for documentation purposes. ri/rj and other command-line
documentation utilities could operate similarly, rather than having to
maintain a dump of documentation that is separate from the actual
libraries. IDEs and even irb would have easy built-in documentation
facilities. Best of all, the most exotic and inscrutable techniques
for defining new methods, currently inscrutable to RDoc and likely
always to remain so, would under this system still be capable of
producing documentation.

Is there a reason why Ruby hasn’t taken this approach? Would it be a
good approach to take?

William

Well, the Ruby interpreter would have to load the files, so yes,
the top-level code in each file would get executed. Most of
this code will just be defining classes, methods, and constants; I’m
assuming that the Ruby interpreter does some sort of parsing of these
on load, and could extract docstrings?

For more dynamic method definitions like the following (from date.rb):

[ %w(sg start),
%w(newsg new_start),
%w(of offset),
%w(newof new_offset)
].each do |old, new|
module_eval <<-“end;”
def #{old}(args, &block)
“Warns that #{old} is obsolete, and executes #{new}”
if $VERBOSE
warn("#{caller.shift.sub(/:in .
/, ‘’)}: "
"warning: #{self.class}##{old} is deprecated; "
“use #{self.class}##{new}”)
end
#{new}(*args, &block)
end
end;
end

(note: I have inserted a sample docstring), yes, this code would have
to be executed (as it would be when the interpreter loaded it). But
I’m not sure that it’s dangerous to do so, if by dangerous you mean
liable to be caught by malicious code; even if it were, better that it
be discovered sabotaging your documentation generator than your
corporate webapp!

(BTW, I think the above is an example of why docstrings are attractive
in Ruby: we’re directly leveraging the language and its ability to
eval at load/run time to dynamically generate documentation.)

It’s true that some files might expect to be loaded in only a
particular context, and might misbehave if they aren’t. I can
imagine a command-line-argument parsing utility or a cgi library being
written this way (I don’t know if the Ruby stdlib ones actually are).
But would any reasonable library actually cause damage if loaded at
the incorrect time? And couldn’t we just tell the interpreter to
ignore top-level errors as much as possible, and continue looking for
docstrings? Besides, isn’t it in general bad style to do much more
than definitions during a file load, particularly for a library?

I’m pretty new to Ruby, and perhaps docstrings don’t really fit in
with the Ruby model. But there was a discussion on this recently
about Smalltalk-like features that Smalltalkers would like to see in
Ruby, and of course many aspects of the Smalltalk development model
are drawn from its Lisp heritage. And docstrings seem to me to be the
kind of dynamic language-introspection support that they were
interested in, and which would fit with the already dynamic and
sinuous nature of Ruby.

William

···

On Mon, Aug 11, 2003 at 12:14:44PM +0900, Dave Thomas wrote:

William Webber wrote:
Best of all, the most exotic and inscrutable techniques

for defining new methods, currently inscrutable to RDoc and likely
always to remain so, would under this system still be capable of
producing documentation.

Is there a reason why Ruby hasn’t taken this approach? Would it be a
good approach to take?

But surely those method definitions would only be available if you
actually executed the code? Wouldn’t that be fairly dangerous?

Francis Hwang wrote:

A related issue is that if you use method_missing to dispatch method
calls, there’s almost no standard way for a parser to figure out what
sorts of methods you’re intending to define. I run into this a lot
because my pet project Lafcadio uses method_missing a lot to let
objects serve as facades for collections of subsystems, and I just
finished writing RDoc comments for everything. When it came down to
the methods handled through method_missing, I had to write class
comments because there’s no individual method definition for RDoc to
parse.

Would it help if I added a facility to RDoc to let you tell it about
methods that it can’t find for itself, perhaps something like:

Dynamically constructed method that handles

SOAP requests from the washing machine.

:def: dynamic_method(arg1, arg2)

A related issue is that if you use method_missing to dispatch method
calls, there’s almost no standard way for a parser to figure out what
sorts of methods you’re intending to define. I run into this a lot
because my pet project Lafcadio uses method_missing a lot to let
objects serve as facades for collections of subsystems, and I just
finished writing RDoc comments for everything. When it came down to
the methods handled through method_missing, I had to write class
comments because there’s no individual method definition for RDoc to
parse.

I imagine that this is the only kind of documentation that would make
sense in that scenario.

Classes that are documented “Supports all the methods of Foo::Bar, but
does this in addition” are well documented, IMO.

All of this is not to complain about RDoc, which is a fantastic tool.
In some ways, it’s nice to have this sort of problem. For a language
like Java you don’t run into this problem because reflection is so
bloody painful in that language. Ruby lets you be more flexible and
clever with method names, but the downside of that is that it raises
the bar for programs (like RDoc) whose functionality depends on
understanding that code.

Gavin

···

On Monday, August 11, 2003, 11:16:53 PM, Francis wrote:

Hi –

William Webber wrote:
Best of all, the most exotic and inscrutable techniques

for defining new methods, currently inscrutable to RDoc and likely
always to remain so, would under this system still be capable of
producing documentation.

Is there a reason why Ruby hasn’t taken this approach? Would it be a
good approach to take?

But surely those method definitions would only be available if you
actually executed the code? Wouldn’t that be fairly dangerous?

Well, the Ruby interpreter would have to load the files, so yes,
the top-level code in each file would get executed. Most of
this code will just be defining classes, methods, and constants; I’m
assuming that the Ruby interpreter does some sort of parsing of these
on load, and could extract docstrings?

There’s a big difference, though, between this and the kind of thing
that, say, Emacs Lisp or Smalltalk do (based on my use of the former
and a few years of hearing people talk in my presence about the
latter). In those environments, you’ve got a running image, and the
docstrings help you query it while it’s running (i.e., certain methods
or functions exist in memory, and you’re querying the memory). In the
case of Ruby, we’re talking about creating documentation from a bunch
of files. Actually executing those files (as opposed to just parsing
them for documentation) has very different implications.

For more dynamic method definitions like the following (from date.rb):

[… date example …]

How about this:

case SOME_CONSTANT
when x
def quack … end
when y
def meow … end
end

where SOME_CONSTANT might be configurable at run-time, and so on.
Again, this would be different if you were working inside a run-time
image: there would be an exact and natural correspondence between the
methods that got defined, and those for which you could retrieve the
docstrings. But if you’ve just got the above sitting in a file, and
you’re going to execute it and terminate, it’s unclear how the
documentation would be produced based on the execution alone.

(BTW, I think the above is an example of why docstrings are attractive
in Ruby: we’re directly leveraging the language and its ability to
eval at load/run time to dynamically generate documentation.)

I’d quibble perhaps with ‘directly’; there’s nothing indirect about
writing a program in Ruby that performs operations on a file, even if
that file happens to contain Ruby code and you don’t happen to execute
it :slight_smile:

It’s true that some files might expect to be loaded in only a
particular context, and might misbehave if they aren’t. I can
imagine a command-line-argument parsing utility or a cgi library being
written this way (I don’t know if the Ruby stdlib ones actually are).
But would any reasonable library actually cause damage if loaded at
the incorrect time? And couldn’t we just tell the interpreter to
ignore top-level errors as much as possible, and continue looking for
docstrings? Besides, isn’t it in general bad style to do much more
than definitions during a file load, particularly for a library?

I think this kind of modification of the interpreter starts to sound a
lot like the dedicated documentation parser you wanted to avoid :slight_smile:
I’m hoping Dave will follow up with some juicy examples of dangerous
situations. But in general, I think the run-time generation of
documentation probably makes more sense in cases (like Smalltalk)
where you’re actually working inside a running image over a period of
time, not just executing a program.

David

···

On Mon, 11 Aug 2003, William Webber wrote:

On Mon, Aug 11, 2003 at 12:14:44PM +0900, Dave Thomas wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Dave Thomas wrote:

Would it help if I added a facility to RDoc to let you tell it about
methods that it can’t find for itself, perhaps something like:

Dynamically constructed method that handles

SOAP requests from the washing machine.

:def: dynamic_method(arg1, arg2)

You have my vote :slight_smile:

Gavin Sinclair gsinclair@soyabean.com.au wrote in message news:977617383.20030811232553@soyabean.com.au

A related issue is that if you use method_missing to dispatch method
calls, there’s almost no standard way for a parser to figure out what
sorts of methods you’re intending to define. I run into this a lot
because my pet project Lafcadio uses method_missing a lot to let
objects serve as facades for collections of subsystems, and I just
finished writing RDoc comments for everything. When it came down to
the methods handled through method_missing, I had to write class
comments because there’s no individual method definition for RDoc to
parse.

I imagine that this is the only kind of documentation that would make
sense in that scenario.

Classes that are documented “Supports all the methods of Foo::Bar, but
does this in addition” are well documented, IMO.

I suppose it depends on who you’re writing the documentation for. In
the example I’m thinking of there’s an object facade (called the
ObjectStore) which abstracts away a lot of its functionality to
subsystems – part of the reason there’s so much dispatching to
suybsystems is so that clients don’t have to think about where the
functionality lives. For them I’d like to pretend that all the methods
belong to the facade.

For people who want a deeper look into the design, you want more
explicit mention of what subsystem handles what methods. Though now
that I think about it I suppose a format like RDoc lends itself more
to deep detail than overviews – I should probably just write more
how-to type documents for that stuff, hm?

Francis

···

On Monday, August 11, 2003, 11:16:53 PM, Francis wrote:

I think this would be a very useful last-resort mechanism.

William

···

On Mon, Aug 11, 2003 at 10:25:41PM +0900, Dave Thomas wrote:

Would it help if I added a facility to RDoc to let you tell it about
methods that it can’t find for itself, perhaps something like:

Dynamically constructed method that handles

SOAP requests from the washing machine.

:def: dynamic_method(arg1, arg2)