DocBook to PDF

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

Ideas welcome.

Thanks,
Hal

Hal Fulton wrote:

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Please, nobody shoot me, but I have to believe this can easily be done with Java tools.

Now, if the tweak-factor is such that actually handling Java-matter is too painful, then perhaps a Ruby solution is better. But from casual following of the xml-dev mailing list, it seems that this sort of thing is a well-solved matter in Java.

Just a thought.

James

···

--

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - Ruby Code & Style: Writers wanted
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools

Hal Fulton wrote:

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

I have Ruby code that converts a usable subset of DocBook to LaTeX. It's a prototype, but it works. It'd be pretty easy to extend it for elements it doesn't handle, modulo difficult things like bibliographic references and citations.

Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

All that's in a separate style file.

I developed this for a specific application: producing formal specification documents by extracting functional models and requirements from a database. I turn the XML into HTML with XSLT and into PDF using LaTeX and dvipdf.

I'll find the latest version tonight and email it to you. If it's useful, maybe I'll put it on Rubyforge.

Steve

Hi,

Here's a suggestion from a coworker of mine:

Does this need to be a pure Ruby solution? If not, I'd suggest using (or
adapting) the standard DocBook stylesheets
(http://wiki.docbook.org/topic/DocBookXslStylesheets\), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/\) or the FO formatter
of your choice.

Alternately, OpenOffice.org can open (and reasonably) format DocBook
files, and has a built-in PDF printer, and is scriptable via Ruby (though I
wouldn't necessarily recommend it).

HTH,
Keith

···

On Sat, 12 Nov 2005, Hal Fulton wrote:

I'm wanting to do some docbook to pdf conversion.

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

Ideas welcome.

My first thought is Holy Crap! Really, I've got a kid about that age :slight_smile:

The typical XML way to do this is DocBook -> XSL:FO -> pdf because you'll be able to use Norm Walsh's stuff. If you do this and you have input documents bigger than, say, two pages, you'll be wanting the *fastest* XML processors you can get. I did this a couple of years ago and found libxml2 and libxslt2 the way to go. Getting a good XSL:FO processor was quite a trick then, and I have not been keeping up to date, so keep that in mind. The only free one that I could find was the Apache FOP processor, and it wasn't all that good (incomplete). There are a couple of commercial processors that are apparently very good, but I wasn't willing to spend a couple hundred bucks on them.

An alternative that is worked quite well was to use the SGML processors for DocBook.

As I said, I've not been keeping up, but apparently both the Apache XML processors and Saxon have become much faster in the last couple of years. Both of those are Java. Unfortunately I don't think Ruby has a remote chance of being useful as an XML processor for this kind of application until Ruby gets much faster.

Putting Ruby into that pipeline for processing the stream sounds more reasonable, but still, that stream is going to get *very* long once you've got XSL:FO.

If you really want to do this yourself brace yourself for a lot of work. Maybe you should choose a subset of DocBook (isn't there a small(ish) subset already defined?)

Are you committed to DocBook? If not you should have a look at DITA (from IBM), and consider latex/contex or one of the groff macro packages (like om (mom)). I'm having some fun with publicon from Wolfram these days (it reminds me of FrameMaker but runs on OS/X (and Windows, linux coming, maybe) and can generate HTML, XML, and latex output). If publicon works out (I'm using it to document a ruby project I'll be open sourcing soon, and a couple of things for work) I'll be sticking with that.

Cheers,
Bob

···

On Nov 11, 2005, at 8:01 PM, Hal Fulton wrote:

Thanks,
Hal

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;

Hal Fulton wrote:

I'm wanting to do some docbook to pdf conversion.

[…]

Ideas welcome.

Yet another suggestion: http://pragma-ade.com/\. ConTeXt has good
support for DocBook and outputs PDFs.

        nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

It is fairly straightforward to convert DocBook/XML to PDF using a
combination of Java tools. I am using the Saxon XSLT processor
(http://saxon.sourceforge.net/\), the standard DocBook XSL stylesheets
(The DocBook Project), and FOP
(http://xmlgraphics.apache.org/fop/\) to accomplish this.

I think that, at some point in the past, I considered using Ruby tools
to do this but they either didn't exist or weren't quite up to snuff
(especially with regards to an XSLT processor).

···

On 11/11/05, James Britt <james_b@neurogami.com> wrote:

Please, nobody shoot me, but I have to believe this can easily be done
with Java tools.

Now, if the tweak-factor is such that actually handling Java-matter is
too painful, then perhaps a Ruby solution is better. But from casual
following of the xml-dev mailing list, it seems that this sort of thing
is a well-solved matter in Java.

Bob Hutchison <hutch@recursive.ca> writes:

I'm wanting to do some docbook to pdf conversion.

I assume the "right" way would be some combination
of REXML and PDF::Writer.

Anyone done anything like this?

Ideally I'd like to be able to tweak a few things
such as margins and page size without changing
anything else...

Ideas welcome.

My first thought is Holy Crap! Really, I've got a kid about that age :slight_smile:

The typical XML way to do this is DocBook -> XSL:FO -> pdf because
you'll be able to use Norm Walsh's stuff. If you do this and you have
input documents bigger than, say, two pages, you'll be wanting the
*fastest* XML processors you can get. I did this a couple of years
ago and found libxml2 and libxslt2 the way to go. Getting a good
XSL:FO processor was quite a trick then, and I have not been keeping
up to date, so keep that in mind. The only free one that I could find
was the Apache FOP processor, and it wasn't all that good
(incomplete). There are a couple of commercial processors that are
apparently very good, but I wasn't willing to spend a couple hundred
bucks on them.

Have you tried passiveTeX? Tei-c: PassiveTex
(And even if you don't need it, acknowledge that major hack. :-P)

An alternative that is worked quite well was to use the SGML
processors for DocBook.

DSSSL, anyone?

As I said, I've not been keeping up, but apparently both the Apache
XML processors and Saxon have become much faster in the last couple
of years. Both of those are Java. Unfortunately I don't think Ruby
has a remote chance of being useful as an XML processor for this kind
of application until Ruby gets much faster.

It all depends on the libs you use. If you go REXML and start
sleeping on your keyboard, no wonder; but there are bindings to
libxml2 and libxslt too...

Are you committed to DocBook? If not you should have a look at DITA
(from IBM), and consider latex/contex or one of the groff macro
packages (like om (mom)).

Texinfo may be a choice for multi-format output too.

I'm having some fun with publicon from
Wolfram these days (it reminds me of FrameMaker but runs on OS/X (and
Windows, linux coming, maybe) and can generate HTML, XML, and latex
output). If publicon works out (I'm using it to document a ruby
project I'll be open sourcing soon, and a couple of things for work)
I'll be sticking with that.

Publicon looks very promising, thanks for mentioning that.

···

On Nov 11, 2005, at 8:01 PM, Hal Fulton wrote:

Cheers,
Bob

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Bob Hutchison wrote:

The typical XML way to do this is DocBook -> XSL:FO -> pdf because you'll be able to use Norm Walsh's stuff. If you do this and you have input documents bigger than, say, two pages, you'll be wanting the *fastest* XML processors you can get. I did this a couple of years ago and found libxml2 and libxslt2 the way to go. Getting a good XSL:FO processor was quite a trick then, and I have not been keeping up to date, so keep that in mind. The only free one that I could find was the Apache FOP processor, and it wasn't all that good (incomplete). There are a couple of commercial processors that are apparently very good, but I wasn't willing to spend a couple hundred bucks on them.

I eventually went with LaTeX because I'm not just trying to make marks on paper, but trying to make beautiful documents that measure up to high standards of typesetting. All the out-of-the-box DocBook/XSL stuff I tried produced ugly output. Maybe things have gotten better.

The real appeal of LaTeX for me is that it operates on document objects at approximately the same level of abstraction as DocBook itself. It's fairly straightforward to translate between the two, and then use styles and macros to control the output formatting.

An alternative that is worked quite well was to use the SGML processors for DocBook.

As I said, I've not been keeping up, but apparently both the Apache XML processors and Saxon have become much faster in the last couple of years. Both of those are Java. Unfortunately I don't think Ruby has a remote chance of being useful as an XML processor for this kind of application until Ruby gets much faster.

It depends on the application. I'm using a brute force REXML parser and it's plenty fast enough for what I need to do. My test data is about 120k (37 pages typeset), and is fairly complex structurally: nested sections, variablelists, EPS figures, etc. I can convert it on an old P3 in about 10 seconds. LaTeX is blazingly fast, so I can afford a little slowness upstream.

Putting Ruby into that pipeline for processing the stream sounds more reasonable, but still, that stream is going to get *very* long once you've got XSL:FO.

If you really want to do this yourself brace yourself for a lot of work. Maybe you should choose a subset of DocBook (isn't there a small (ish) subset already defined?)

You don't need a very big subset for many documents. Mine handles

appendix
article
articleinfo
biblioid
blockquote
caption
colspec
emphasis
entry
figure
formalpara
imagedata
itemizedlist
listitem
mediaobject
orderedlist
para
pubdate
row
section
simpara
table
term
tgroup
thead
title
variablelist
xref

in about 350 lines of Ruby. It also handles profiles.

Steve

Keith Fahlgren wrote:

Does this need to be a pure Ruby solution? If not, I'd suggest using (or adapting) the standard DocBook stylesheets (http://wiki.docbook.org/topic/DocBookXslStylesheets\), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/\) or the FO formatter
of your choice.

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

Alternately, OpenOffice.org can open (and reasonably) format DocBook
files, and has a built-in PDF printer, and is scriptable via Ruby (though I
wouldn't necessarily recommend it).

There doesn't really *need* to be any scripting necessarily. I just want
to be able to reformat the doc in a couple of different ways.

My OOo, however, doesn't seem to know DocBook. It's probably old (1.1.0) --
what is the newest one? Or is this some separate plugin?

Thanks,
Hal

···

On Sat, 12 Nov 2005, Hal Fulton wrote:

Bob Hutchison wrote:

My first thought is Holy Crap! Really, I've got a kid about that age :slight_smile:

Huh? Smiley or not, I don't get this remark. And I so hate to be
humor-impaired. :wink:

The typical XML way to do this is DocBook -> XSL:FO -> pdf because you'll be able to use Norm Walsh's stuff. If you do this and you have input documents bigger than, say, two pages, you'll be wanting the *fastest* XML processors you can get.

This sounds similar ot others' advice, so I will be looking into it.

As for speed, I do have a large doc -- 200 pages or so -- but as I
will only reformat it once or twice, I'm not sure I care much
about speed. Unless it's a "cyclic" thing where I have to tweak it
and look at the results and tweak again.

Are you committed to DocBook? If not you should have a look at DITA (from IBM), and consider latex/contex or one of the groff macro packages (like om (mom)). I'm having some fun with publicon from Wolfram these days (it reminds me of FrameMaker but runs on OS/X (and Windows, linux coming, maybe) and can generate HTML, XML, and latex output). If publicon works out (I'm using it to document a ruby project I'll be open sourcing soon, and a couple of things for work) I'll be sticking with that.

For this particular project, the source is in DocBook. It's been
transformed into other forms -- RTF, PDF, TeX, HTML.

There might be other ways to do this -- all I really want is to
change the page size (and possibly margins) and re-flow. But the
original source is DocBook.

Thanks,
Hal

Lyle Johnson wrote:

Please, nobody shoot me, but I have to believe this can easily be done
with Java tools.

Now, if the tweak-factor is such that actually handling Java-matter is
too painful, then perhaps a Ruby solution is better. But from casual
following of the xml-dev mailing list, it seems that this sort of thing
is a well-solved matter in Java.

It is fairly straightforward to convert DocBook/XML to PDF using a
combination of Java tools. I am using the Saxon XSLT processor
(http://saxon.sourceforge.net/\), the standard DocBook XSL stylesheets
(The DocBook Project), and FOP
(http://xmlgraphics.apache.org/fop/\) to accomplish this.

I think that, at some point in the past, I considered using Ruby tools
to do this but they either didn't exist or weren't quite up to snuff
(especially with regards to an XSLT processor).

Wow, Lyle. You continue to amaze me. :slight_smile:

What might be straightforward for you might not be for me.

I don't have any of these tools installed, and I've never heard of
FOP or FO.

Still my best shot?

Thanks,
Hal

···

On 11/11/05, James Britt <james_b@neurogami.com> wrote:

Hal Fulton wrote:

Keith Fahlgren wrote:

Does this need to be a pure Ruby solution? If not, I'd suggest using (or adapting) the standard DocBook stylesheets (http://wiki.docbook.org/topic/DocBookXslStylesheets\), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/\) or the FO formatter
of your choice.

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

XSL-FO is a W3C standard, a product of the XSL working group (the same people who put out XSLT, the language for converting XML documents). XSL-FO is an intermediary XML language for represented "formatted objects" (rich text documents). Many tools exist to convert XSL-FO files (documents) into PDF, PNG, etc. Apache FOP is the popular Java one, but you don't need to know Java to use it. It has a command line interface whereby you feed it the filename of an XSL-FO file and the filename of your desired PDF file.

Capisce?

Devin
(Tedious? Error-prone? It's a W3C XML standard, so, probably. But, luckily, you don't have to write any of that crap -- just use it.)

···

On Sat, 12 Nov 2005, Hal Fulton wrote:

Hi --

···

On Sun, 13 Nov 2005, Hal Fulton wrote:

Keith Fahlgren wrote:

On Sat, 12 Nov 2005, Hal Fulton wrote:

Does this need to be a pure Ruby solution? If not, I'd suggest using (or adapting) the standard DocBook stylesheets (http://wiki.docbook.org/topic/DocBookXslStylesheets\), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/\) or the FO formatter
of your choice.

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

There's a bit of setup involved, but I've got scripts for saxon (XSLT
parser) and FOP, and it's all pretty streamlined. I've always tried
to bundle as much of this stuff as possible into a single directory;
and though I haven't upgraded it lately, I can share it as a bundle if
needed.

David

--
David A. Black
dblack@wobblini.net

It's a bit tedious to install, but the toolchain is mostly automatic.

docbook->doc book xsl-> xsl-fo-> pdf

All you reall do is execute the toolchain against the docbook. So, you
don't really need to know xslt, xsl-fo, or pdf.

Compared to Rexml and PDF::Writer, this is a whole lot simpler. Spend
a couple of hours configuring tools, create a batch file or ant
script, and you are rolling.

Sun had created an Ant tool called pipeline for handling processes
much like this. When I looked at it was mostly a concept. Not sure
where it went.

Interestingly, xsl-fo was the first concept behind xslt
(t=transformations) which is used to mangle xml docs from one format
to another. During the production of "xsl" they need to general
transformations became apparent, so they split xsl into xsl-t
(transformations) and xsl-fo (formatting objects). Now everyone uses
xslt and xsl-fo is languishing in near obscurity, even though it was
the initial impetus behind the whole xsl thing.

Regards,
Nick

···

On 11/12/05, Hal Fulton <hal9000@hypermetrics.com> wrote:

Keith Fahlgren wrote:
> On Sat, 12 Nov 2005, Hal Fulton wrote:
>
>
> Does this need to be a pure Ruby solution? If not, I'd suggest using (or
> adapting) the standard DocBook stylesheets
> (http://wiki.docbook.org/topic/DocBookXslStylesheets\), and using
> (perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
> then feed into FOP (http://xmlgraphics.apache.org/fop/\) or the FO formatter
> of your choice.

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

> Alternately, OpenOffice.org can open (and reasonably) format DocBook
> files, and has a built-in PDF printer, and is scriptable via Ruby (though I
> wouldn't necessarily recommend it).

There doesn't really *need* to be any scripting necessarily. I just want
to be able to reformat the doc in a couple of different ways.

My OOo, however, doesn't seem to know DocBook. It's probably old (1.1.0) --
what is the newest one? Or is this some separate plugin?

Thanks,
Hal

--
Nicholas Van Weerdenburg

Then you might want to do something like: <Chapter 8. Printed output options;

···

On Nov 12, 2005, at 1:56 PM, Hal Fulton wrote:

There might be other ways to do this -- all I really want is to
change the page size (and possibly margins) and re-flow. But the
original source is DocBook.

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;

Bob Hutchison wrote:

My first thought is Holy Crap! Really, I've got a kid about that age :slight_smile:

Huh? Smiley or not, I don't get this remark. And I so hate to be
humor-impaired. :wink:

I meant to answer this, but today's news does a better job:

FOP 0.9 alpha 1 was released today <http://xmlgraphics.apache.org/fop/0.90/&gt;

From that page:

The Apache FOP team is proud to present to you the largely rewritten codebase which is finally in a state where you can start to use it. It has taken over three years to get this far and over two years without a new release from the FOP project. We would like to encourage you to download the code and to play with it. We're still in the process of adding new major features and stabilizing the code. We welcome any feedback you might have and even more, any other form of help to get the project forward.

*Three* years!

That's why I had the reaction.

As for the phrasing, it is just an annoying phrase rather common among kids these days.

Cheers,
Bob

···

On Nov 12, 2005, at 1:56 PM, Hal Fulton wrote:

----
Bob Hutchison -- blogs at <http://www.recursive.ca/hutch/&gt;
Recursive Design Inc. -- <http://www.recursive.ca/&gt;
Raconteur -- <http://www.raconteur.info/&gt;

I don't remember exactly how I put all the pieces together back when I
was starting to look at this, but it's certainly not because I
actually understand how it all works. :wink:

FO stands for "formatting object", and that is literally everything
that I know about it. Seriously. I know that when I process my DocBook
XML documents with Saxon (which is a standalone executable-type
program), it uses some XSL instructions in the DocBook XSL stylesheets
to produce an FO file. Oh, I don't know anything about XSL either, by
the way. I just know that it's a piece of the puzzle. Anyways, I then
can use Apache's FOP program (also a command-line program) to spit out
a PDF.

We can talk about it more offline if you do decide to go this route.
It's mostly a job of downloading the stuff and installing it, though.

···

On 11/12/05, Hal Fulton <hal9000@hypermetrics.com> wrote:

What might be straightforward for you might not be for me.

I don't have any of these tools installed, and I've never heard of
FOP or FO.

Still my best shot?

Devin Mullins wrote:

Hal Fulton wrote:

Keith Fahlgren wrote:

Does this need to be a pure Ruby solution? If not, I'd suggest using (or adapting) the standard DocBook stylesheets (http://wiki.docbook.org/topic/DocBookXslStylesheets\), and using
(perhaps via the Ruby libxslt bindings) those to generate XSL-FO, which you can
then feed into FOP (http://xmlgraphics.apache.org/fop/\) or the FO formatter
of your choice.

That sounds tedious and error-prone to me -- I don't normally use any
of this stuff. And I don't even know what FO is.

XSL-FO is a W3C standard, a product of the XSL working group (the same people who put out XSLT, the language for converting XML documents). XSL-FO is an intermediary XML language for represented "formatted objects" (rich text documents). Many tools exist to convert XSL-FO files (documents) into PDF, PNG, etc. Apache FOP is the popular Java one, but you don't need to know Java to use it. It has a command line interface whereby you feed it the filename of an XSL-FO file and the filename of your desired PDF file.

To drill it in:

DocBook --(XSLT)--> XSL-FO --(FOP)--> PDF

The XSLT pictured is a specific XSLT stylesheet for converting DocBook to XSL-FO. XSLT is a general XML language specification for XML file conversion, as specified before, and according to Keith, there exists a specific XSLT "script" for DocBook --> XSL-FO. You will need an XSLT "interpreter" to run it. Xalan is one. (Google it.)

FOP is FOP. It needs no specific thing. XSL-FO and PDF are both fairly standard.

Devin

···

On Sat, 12 Nov 2005, Hal Fulton wrote:

Hal Fulton ha scritto:

Lyle Johnson wrote:

Please, nobody shoot me, but I have to believe this can easily be done
with Java tools.

Now, if the tweak-factor is such that actually handling Java-matter is
too painful, then perhaps a Ruby solution is better. But from casual
following of the xml-dev mailing list, it seems that this sort of thing
is a well-solved matter in Java.

It is fairly straightforward to convert DocBook/XML to PDF using a
combination of Java tools. I am using the Saxon XSLT processor
(http://saxon.sourceforge.net/\), the standard DocBook XSL stylesheets
(The DocBook Project), and FOP
(http://xmlgraphics.apache.org/fop/\) to accomplish this.

I think that, at some point in the past, I considered using Ruby tools
to do this but they either didn't exist or weren't quite up to snuff
(especially with regards to an XSLT processor).

Wow, Lyle. You continue to amaze me. :slight_smile:

What might be straightforward for you might not be for me.

I don't have any of these tools installed, and I've never heard of
FOP or FO.

Still my best shot?

even ignoring everything about xsl-fo and such stuff, I used
xsltproc to get pdf out of docbook, it was really easy, see
http://www.sagehill.net/docbookxsl/Makefiles.html

···

On 11/11/05, James Britt <james_b@neurogami.com> wrote: