Hi all.
I'm pleased to announce 0.0.1 (aka "early adopters only" release) of my
Uniforma library.
It's here: http://rubyforge.org/projects/uniforma/
== What is it?
Library for parsing "simple text" formats (RD, Textile, Markdown, etc.) and
generating output in various formats (including simple text, html/xml and
more complex ones).
The heart of the library is two DSLs - for defining parsers and generators.
== Why?
1. Preparing "one more serious library"'s documentation, I've found a
dillema: write it in RD? (for auto-generate all with RDoc) or Trac's wiki
format? (for uploading to Trac site) or Textile? (for once uploading to
stand-alone site) So I've decided to do conversion library/tool.
2. I'm using RedCloth (Textile) for all my works, and trying to patch it for
my needs, I've found it's a mess. I just need to have separate clear
description of "how is it parsed" and "how is it generated" aspects.
3. For my journalism, I need MS Word output (I have no fun to do text
editing in MS Word, but ability to generate it is a must). Now I use
"Textile=>(RedCloth)=>HTML=>`winword mytext.html`" scheme, which have
several flaws. I want be able to easy define MS Word generator (using
win32ole, of course, no hand-made heroism).
== Show. Me. The. Code.
Usage:
puts Uniforma::textile('*some text* "with
links":http://google.com.').to_html_string
output:
<html><body>
<p><b>some text</b> <a href='http://google.com'>with links</a>.</p>
</body></html>
Defining parsers:
···
---
module Uniforma::Parsers
class Textile < LineParser
definition do
....
#how to parse some line
....
line /^h(\d+)\.\s+/ do para(:heading, :level => @_1.to_i) end
....
#how to parse inline formatting:
inline /__(.+?)__/, :italic
end
end
end
---
Defining generators
---
module Uniforma::Generators
class HtmlString < TextGenerator
definition do
...
#what to place around some "paragraph type"
around(:heading) {|p| i = p.level; ["<h#{i}>", "</h#{i}>\n"]}
...
#what to place around some "inline markup type"
around(:italic) {["<i>", "</i>"]}
end
end
end
---
Uniforma is smart enough to allow:
* non-line based formats parsers (in fact, it also has one "toy" parser for
HTML, which even works! on not-very-complex HTML documents)
* non-text format generators (I'm working on PDF and MSWord generators. It's
not very hard to define with Uniforma)
== Important notes about current release
* This release shamelessly includes htmlentities library by Paul Battley[1],
without even notice it in license files. It is subject to change ASAP.
* It's really "early adopters" release. Almost no docs, and very, very poor
tests. But it shows an idea and is a base for further work.
* This release include parsers for: Textile, RD, HTML and generators for:
BBcode, RD, HTML. All of them are incomplete but tend to work.
* I'd want to hear opinions about whether DSLs for parser/generator looks
"right" from point-of-view of a) native English speakers and b) real Ruby
ninja. You can examine my parsers in lib/uniforma/parsers/ and generators in
lib/uniforma/generators/
Again, the library is here: http://rubyforge.org/projects/uniforma/
Thanx.
Zverok.