This was orignally a follow up to my question about YAML documentation, but
grew into a separate topic.
This is not an announcement, but I think it’s about time to get some
feedback on the design. It’s hardly specific to Ruby, but I guess it would
work well in a Ruby context.
“Mauricio Fernández” batsman.geo@yahoo.com wrote in message
news:20030223073420.GA13356@student.ei.uni-stuttgart.de…
BTW: what tool(s) did you use to produce the Yaml documentation?
Yaml Take a look at doc/yamlrb.yod: it is a “Yaml document”, to be
processed by Yod (Yaml Ok Documentation). See src/yod.rb:
I kind of figured But it must be post-processed to xsl-fo or something?
I have on and off for a long time been hacking on a simple xml format - like
some of the people behind yaml - then comes yaml. But meanwhile I changed
focus towards a format that is supposed to be especially suited for
documentation purposes. yaml is also text typing friendly - but still not
the best possible for text entry. I hacked something in Ruby but need to
back to it. The primary motivation is that Tex is too complex and xml-doc is
too cumbersome - and finally the need to have a text format as you can’t
trust wordprocessors to be around in the long term, and are bad for
formattting and source control.
Ruby doc format is a similar approach, but not sufficiently advanced in
formatting.
Perhaps I should ask for some help here in getting the format completed?
The design goals are
- absolutely minimum of escape symbols
- arbitrarily complex nesting
- automatic tag-close based on context
- support for meta-tagging (comments, other languages, notes)
- headers etc. should not be escaped by = for level 1, == for level 2 etc.,
because it makes it difficult to move a section.
I see it as a possiblity to use Wiki like interface for advanced text
formatting purposes - and also for non-text purposed - but here YAML or even
XML might be better.
Currently I haven’t looked much into how to represent lists, a case where
YAML clearly excells.
I’ve written a prelim. spec., but I’m considering changing it a bit. Here
are the main points (it’s simple because that’s the whole point).
I’ve currently got some problems handling paragraph breaks - I don’t want to
type them everywhere, but deducing them can be tricky.
I called it STEP: Structured Text Entry Processor.
Text is text. A blank line is is paragraph break (whatever that means in the
given context). The only escape symbols are curly braces. This form a
command.
example:
{chapter The first chapter}
Here is text. Then next sentence is bolded. {b This is bolded text}. This is
not bold.
{chapter The next chapter}
Here is text in chapter two.
{section a subsection} Text in section. {note needs cleanup}
{chapter Also a chapter}
Clearly tags (called commands) follow ‘{’. These are not predefined in STEP.
STEP provides means to define tags hierarchies which enables one tag to
automatically close another. STEP also has two kinds of commands: those that
has a header and a body, and those that only have a header:
{b header only}, {chapter header} body {chapter header} body
I am actually considering having two different symbols for the two command
styles: {b header only}, [chapter header] body [chapter header] body
But then I would have more symbols to escape.
I am also considering moving the command name outside of ‘{’:
This is b{bolded text} this is not bolded.
chapter{The chapter title} The chapter body section{Text in section}
However, currently the name follows ‘{’ as in: This is {b bolded text}.
Semantics are the most important, but here is the basic syntax:
::= ( | )( | |)*
::= ‘{’ [+ ] ‘}’
::= (SYMBOL except , ‘{’, ‘}’, ‘(’, or ‘)’)*
::= ( | ‘{’ | ‘}’ | ‘(’ | ‘)’ )*
::= [ ‘(’ ‘)’]
::= – reserved for future
, ::= – see below
The only escaped symbols are ‘{’ and ‘}’. '' is not escaped: If you want to
write ‘{’ you must write ‘{’, but if you want to write ‘{’ you write
‘\{’. '' only has a special meaning before ‘{’ or ‘}’.
Spaces are usually merged into a single command. To have
explicit spaces in front of text or just spaces, use the command with no
name:
The following are multiple spaces { }and the following are multiple
{ spaces follewed by text}.
Not shown: There a special commands for handling source code text completely
unescaped using something similar to <<EOInput, and another simpler option
where { } are only required to be balanced.
are reserved for future used. They would a allow a syntax like
{font(courier, 10) some text in courier}.
The following is an attempt to clearly define the space syntax. The
and are significant. is stripped and
is only used to seperate the command name from the following text. There are
problems - how to deal with space before and after a field if the field
evaluates to nothing, and there are several issues with explicit
paragaph-breaks and implicit breaks (like after a chapter title). Therefore,
a higher lever syntax must also be used to handle document output and clean
up repeated breaks.
::= |
::= ( | )+
::= SPACE | TAB
::= (CR LF | LF | CR not followed by LF )
::= * [ *]
::= [] ([] )+
UTF-8 symbols are handled directly by the syntax. In fact the format is
perfectly suited for binary encodings as long as ‘{’, ‘}’ are escaped and
space sequences are contained in { }.
Something that I haven’t covered here is how you can define commands as
macros of other commands, and how you can define commands to be subordinate
to other commands for automatic tag closing. While there is a syntax for
doing so, this is something that can be defined outside of the scripting
syntax such that commands like {chapter} and {section} are predefined. The
processor will also accept undefined commands, but in that case they will be
treated as having no body - that is they stop exactly where at ‘}’.
Another issue not covered is that commands inside the header or body of
other commands may be treated specially within that context. Thus a command
can act as modifier to the active parent command: e.g. {chapter {1}
Introduction}, here {1} acts as an enumeration command. This is partly why I
haven’t settled for arguments to commands. In fact the entire header text of
a command could be viewed as arguments to certain commands. E.g.
{font courier, 10}
{font {name courier}{size 10}}
STEP is only a syntax and a processor, so a separate layer on top of STEP
would be needed for a particular purpose. One such layer could be a generic
handler for generated XSL-FO, a subset of Latex, HTML and Doc-Book.
{Early-brainstorming}
As I mentioned, I am considering moving the command name outside the of the
curly braces, but I havent investigated this further yet. I personally tend
to think “bold” and then realize I need to add some delimiters, typically
going back to add the curly brace. Compare this to LISP versus other
languages function syntax: (print “foo”) and print(“foo”)
Also, I am considering having a special short command notation for commands
covering a single word:
Only the word b,only is bolded.
Only the word {b only} is bolded.
Two commas happen infrequently in natural text but are quick to enter and
easy to read. Two commas (or more) would be escaped by {,}, analogous to
escaping spaces.
It could also be used with linebreaks when preceeded by colon:
chapter:,This is chapter 1
This is the content of chapter 1.
However, I don’t really like too many special cases just to make things
marginally easier. It’s much easier if commands are exactly { } and nothing
else. I might by intoo the , notion because it’s so much easier.
As mentioned I do have some prototype code around - mail if interested. I
also learned that Ruby really needs a lexer tool, it was not as easy to
implement in Ruby as I had expected.
Mikkel
···
On Sun, Feb 23, 2003 at 08:06:16AM +0900, MikkelFJ wrote: