Project suggestion: Ruby code indenter

From the thread “Extension Language for a Text Editor”:

So basically, if you like a modal editor or not, go for editor of form
of your choice. I’ve seriously considered switching to vim simply
because of the ruby support. (A few things have kept me from doing
this, but they were merely technical issues.)

You’re welcome to come aboard ;-). I’m the maintainer of the Vim indent
script, together with Gavin Sinclair. It is, in my opinion, better than
the one that comes with Emacs, even though I guess Matz wrote that one
;-).

I was meaning to mention this anyway, but now I can’t resist. I think
a great project for someone to work on - someone who really really
wants to work on a project but isn’t sure what :slight_smile: - is a Ruby code
indenter.

Input:
Ruby code

Output:
Properly indented Ruby code, perhaps accounting for user preferences

Motivation:
Ruby is a hard language to programatically indent, for reasons that
will become obvious if this thread goes anywhere. Attempts to
provide support for this in Vim and Emacs are progressing, but are
hampered by languages which are not really suited to the task
(please prove me wrong).

If a general-purpose program were provided, it would offer a
solution to any editor and for standalone use, as well as inspiring
greater agility in the existing editor plugins. It would not render
such plugins obsolete, rather provide a backup for the tasks they do
not easily do (indent entire file, accounting for prefs, comments,
here-docs, etc.).

Comments:
A Ruby implementation could take advantage of irb code, just like
RDoc does. Understanding Ruby code, as opposed to reading a text
stream, makes indentation much easier.

There’s no way I have time to work on this; just throwing it out
there in case it catches someone’s fancy.

BTW…

Because [Emacs is] general, people have written lots of stuff, some
of which is quite silly (tetris, web browser, etc.),

…on the rare occasions I play Tetris, it’s as a Vim plugin :slight_smile:
Search www.vim.org if you’re interested.

Gavin

I was meaning to mention this anyway, but now I can’t resist. I think
a great project for someone to work on - someone who really really
wants to work on a project but isn’t sure what :slight_smile: - is a Ruby code
indenter.

Input:
Ruby code

Output:
Properly indented Ruby code, perhaps accounting for user preferences

Motivation:
Ruby is a hard language to programatically indent, for reasons that
will become obvious if this thread goes anywhere. Attempts to
provide support for this in Vim and Emacs are progressing, but are
hampered by languages which are not really suited to the task
(please prove me wrong).
I’d love to prove you wrong. I have, however, as you, discovered that
it is a bitch to indent Ruby programatically. It’s syntax is simply too
general. There is such overloading of so many tokens that it’s hard to
get every case right, while maintaining compatibility with other cases.
For every case you fix, you’ll have to check that it doesn’t affect any
of the other ones.
Anyway, it would be an interesting project. If I’m any judge, Perl 6
would make this very much easier to do. However, it should be generally
possible in any language. I’d assume Ruby would fit the task quite well
actually. The hard part is, of course, keeping track of all the cases.
However, it is quite well specified what may exists where, and in many
ways it is also easier to manage than a language such as C. Also, the
coding standards of Ruby are quite well defined as well, and almost
everyone seems to stick to them rather passionately, so this makes
things easier. I can’t promise that I’ll take a look this personally,
since I’ll be rather busy with other things in a near future. I will,
however, try to improve the Vim indenter to the best of my ability.
By the way, if you read this and you use Vim, please check out the
Vim/Ruby project at
http://rubyforge.org/projects/vim-ruby/
and try out all the latest features. Much work has been done since the
6.2 release, and it needs a good test-run.

If a general-purpose program were provided, it would offer a
solution to any editor and for standalone use, as well as inspiring
greater agility in the existing editor plugins. It would not render
such plugins obsolete, rather provide a backup for the tasks they do
not easily do (indent entire file, accounting for prefs, comments,
here-docs, etc.).
like indent(1) you mean? I rarely run indent, but if I was ever to
alter other people’s code, I’d probably run it through indent(1) before
running it through Vim’s.

Comments:
A Ruby implementation could take advantage of irb code, just like
RDoc does. Understanding Ruby code, as opposed to reading a text
stream, makes indentation much easier.

There’s no way I have time to work on this; just throwing it out
there in case it catches someone’s fancy.

Because [Emacs is] general, people have written lots of stuff, some
of which is quite silly (tetris, web browser, etc.),

…on the rare occasions I play Tetris, it’s as a Vim plugin :slight_smile:
Search www.vim.org if you’re interested.
I like the one that comes with Zsh better :-D,
niklai

···


::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux[“\021%six\012\0”],(linux)[“have”]+“fun”-97);}

By the way, if you read this and you use Vim, please check out the
Vim/Ruby project at
http://rubyforge.org/projects/vim-ruby/
and try out all the latest features. Much work has been done since the
6.2 release, and it needs a good test-run.

I’ll just note for those interested that “the latest features” are
only available via CVS at the moment. A “devel” release will be out
shortly.

If a general-purpose program were provided, it would offer a
solution to any editor and for standalone use, as well as inspiring
greater agility in the existing editor plugins. It would not render
such plugins obsolete, rather provide a backup for the tasks they do
not easily do (indent entire file, accounting for prefs, comments,
here-docs, etc.).

like indent(1) you mean? I rarely run indent, but if I was ever to
alter other people’s code, I’d probably run it through indent(1) before
running it through Vim’s.

Precisely like indent. Say it were called ‘rindent’, then from within
Vim (or any editor; that’s the point) you can run

:%!rindent

and have it done nicely. Obviously you’re still going to use your
editor’s indenting features as you type and want to correct small
blocks.

Also, Nikolai, I thought this would be perfect for you, as you have
already done it in VimL :-* and are gearing up to do it in pcpEdit in
Ruby :wink:

Gavin

···

On Saturday, October 11, 2003, 3:25:17 AM, Nikolai wrote:

[me asking if it would be like indent(1)]

Precisely like indent. Say it were called ‘rindent’, then from within
Vim (or any editor; that’s the point) you can run

:%!rindent

and have it done nicely. Obviously you’re still going to use your
editor’s indenting features as you type and want to correct small
blocks.

OK. The good thing with Ruby, over C, for this kind of thing is that
most people seem to keep to a rather similar way of ‘type-setting’ their
programs. We could perhaps use this to our advantage somehow.

Also, Nikolai, I thought this would be perfect for you, as you have
already done it in VimL :-* and are gearing up to do it in pcpEdit in
Ruby :wink:

Haha, OK. I’ll see what I can do. I’ve always wondered if it would be
possible to do this kind of thing with a yacc/racc or such similar.
pcpEdit heh. That will not be the official name ;-). I’m thinking of
‘ned’, for Nikolai EDitor, or simply the name Ned (as in Flanders) in
tribute of editors such as Sam, Wily, and family. Other, more
silly/stupid names were scamacs (emacs spelled backwards prepended to
emacs, with e’s removed) and scam-e (emacs spelled backwards). And
also, I haven’t decided on Ruby yet, but yes, it will probably be Ruby
actually. I think it can work rather well.
nikolai

···


::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux[“\021%six\012\0”],(linux)[“have”]+“fun”-97);}

You mention yacc/racc, and I was curious as to your opinions on the
subject of an indent like program and the best way to approach it.

Due to the limitations of editors and the like, real-time indentation
calculation is inherently error prone because we’re working with a
subset of the file and the more accuracy we want in the heuristics of the
indentation, the more complex our scripts which are responsible for said
indentation become.

One way to approach the problem when writing an external program which
is responsible for re-indenting a file would be to parse the file into a
kind of verbose abstract syntax tree and then write the tree back out
using straight forward rules regarding indentation and white space.
This has straight-forward advantages and disadvantages, as well as
consequences which I’m probably overlooking. The major advantage that I
see is that the resulting file could be almost prefect given that we had
a parser for the complete grammar. One of the disadvantages would be
that things like same-line comments would probably get converted to
full-line comments or vice-versa more often than desirable. I’m also
not sure about the relative performances of the two methods…on one
hand parsing the entire file into a syntax tree is processor intensive
and requires memory space for the tree (although the file could be
parsed incrementally I suppose, writing out the nodes that we’re
currently at as long as they’re “closed”, meaning they would no longer
affect the indentation of elements to come), whereas parsing regarding a
large set of regular expressions requires running the buffer of text
through multiple regexes, etc.

Personally, I think grammars and parsers are pretty fun/neat, so writing
an indent-like program using them would probably be more interesting
than writing one using a sequence of regular expressions similar to
writing a syntax file. What’s the normal way of doing this (i.e. how
are indent and astyle implemented) and what do you think would be the
best? Any advantages or disadvantages of the methods that I’m not
seeing?

–Aaron

···

On 2003-10-11, Nikolai Weibull ruby-talk@pcppopper.org wrote:

[me asking if it would be like indent(1)]

Precisely like indent. Say it were called ‘rindent’, then from
within Vim (or any editor; that’s the point) you can run

:%!rindent

and have it done nicely. Obviously you’re still going to use your
editor’s indenting features as you type and want to correct small
blocks.

OK. The good thing with Ruby, over C, for this kind of thing is that
most people seem to keep to a rather similar way of ‘type-setting’
their programs. We could perhaps use this to our advantage somehow.

Also, Nikolai, I thought this would be perfect for you, as you have
already done it in VimL :-* and are gearing up to do it in pcpEdit in
Ruby :wink:

Haha, OK. I’ll see what I can do. I’ve always wondered if it would
be possible to do this kind of thing with a yacc/racc or such similar.
pcpEdit heh. That will not be the official name ;-). I’m thinking of
‘ned’, for Nikolai EDitor, or simply the name Ned (as in Flanders) in
tribute of editors such as Sam, Wily, and family. Other, more
silly/stupid names were scamacs (emacs spelled backwards prepended to
emacs, with e’s removed) and scam-e (emacs spelled backwards). And
also, I haven’t decided on Ruby yet, but yes, it will probably be Ruby
actually. I think it can work rather well.

You mention yacc/racc, and I was curious as to your opinions on the
subject of an indent like program and the best way to approach it.

Due to the limitations of editors and the like, real-time indentation
calculation is inherently error prone because we’re working with a
subset of the file and the more accuracy we want in the heuristics of the
indentation, the more complex our scripts which are responsible for said
indentation become.

Yes, this is the main problem we face. With limited context we can also
only get limited usefulness.

One way to approach the problem when writing an external program which
is responsible for re-indenting a file would be to parse the file into a
kind of verbose abstract syntax tree and then write the tree back out
using straight forward rules regarding indentation and white space.

Yes, this is precisely the idea I had for it. I don’t know if it’s
possible to get right though.

This has straight-forward advantages and disadvantages, as well as
consequences which I’m probably overlooking. The major advantage that I
see is that the resulting file could be almost prefect given that we had
a parser for the complete grammar. One of the disadvantages would be
that things like same-line comments would probably get converted to
full-line comments or vice-versa more often than desirable.

Yes, this may be a problem. The more information about the file you
store in the ‘verbose abstract syntax tree’ though, the more you can
keep the old structure as well.

I’m also not sure about the relative performances of the two
methods…on one hand parsing the entire file into a syntax tree is
processor intensive and requires memory space for the tree (although
the file could be parsed incrementally I suppose, writing out the
nodes that we’re currently at as long as they’re “closed”, meaning
they would no longer affect the indentation of elements to come),
whereas parsing regarding a large set of regular expressions requires
running the buffer of text through multiple regexes, etc.

This is probably not a problem. Source files are generally not very
large.

Personally, I think grammars and parsers are pretty fun/neat, so writing
an indent-like program using them would probably be more interesting
than writing one using a sequence of regular expressions similar to
writing a syntax file. What’s the normal way of doing this (i.e. how
are indent and astyle implemented) and what do you think would be the
best? Any advantages or disadvantages of the methods that I’m not
seeing?

indent(1) works by lexing the C file and basically applying heuristic
rules to it. astyle I don’t know. The main advantage is that it works
rather well ;-). The main disadvantage is that it is only heuristic.
It’s not necessarily correct, (or, as the indent(1) manual states “it is
not guaranteed that running indent on the same file will generate the
same output every time”),
nikolai

···


::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux[“\021%six\012\0”],(linux)[“have”]+“fun”-97);}