TexTile to Word Document

Hi!

I have fallen quite in love with TexTile! It helps me write reasonably comprehensible documents "on the go" - free from worrying about formatting, just nice basic clean documents. Combine that with a script that adds on some formatting style information and I get pretty nice HTML documents.

To extend this to my next level of use, I would like to use it for creating Word documents. Does anyone know any scripts/ ideas on creating Word documents using TexTile? My documents are not overly complicated - My usage is:
* Mostly h1 - h3
* A lot of stuff that should be trated as monospaced (essentially using @text@)
* A bunch of lists
* Basic formatting using underline, italics and bold
* some times a div that has a special ID

If this works, it would really make it easy for me to take the content and create document - then, add on a Word template and get nicely styled documents.

I don't mind if the solution is Windows only (i.e., it relies on win32api).

Any nudges in the correct direction?

Cheers,
Mohit.
4/16/2009 | 12:15 AM.

Don't! That's the best advice I can give.

If you must, probably the simplest place to start is creating an ODF document,
or something else Word can read -- in the case of ODF, you may have to have
OpenOffice read it, then save as Word.

But I'm really curious to know why you need this? If it's just for them to be
nicely styled, it would be much faster to learn CSS -- or, if that's not an
option, to perhaps convert the word template to an HTML template.

···

On Wednesday 15 April 2009 11:16:48 Mohit Sindhwani wrote:

To extend this to my next level of use, I would like to use it for
creating Word documents. Does anyone know any scripts/ ideas on
creating Word documents using TexTile?

You might look at Asciidoc. I don't know if it supports Word generation,
however.

···

On Wed, Apr 15, 2009 at 11:16 AM, Mohit Sindhwani <mo_mail@onghu.com> wrote:

Hi!

I have fallen quite in love with TexTile! It helps me write reasonably
comprehensible documents "on the go" - free from worrying about formatting,
just nice basic clean documents. Combine that with a script that adds on
some formatting style information and I get pretty nice HTML documents.

To extend this to my next level of use, I would like to use it for creating
Word documents. Does anyone know any scripts/ ideas on creating Word
documents using TexTile? My documents are not overly complicated - My usage
is:
* Mostly h1 - h3
* A lot of stuff that should be trated as monospaced (essentially using
@text@)
* A bunch of lists
* Basic formatting using underline, italics and bold
* some times a div that has a special ID

If this works, it would really make it easy for me to take the content and
create document - then, add on a Word template and get nicely styled
documents.

I don't mind if the solution is Windows only (i.e., it relies on win32api).

Any nudges in the correct direction?

Cheers,
Mohit.
4/16/2009 | 12:15 AM.

--
Dean Wampler
twitter: @deanwampler, @chicagoscala
Chicago-Area Scala Enthusiasts (CASE):
- http://groups.google.com/group/chicagoscala
- http://www.meetup.com/chicagoscala/ (Meetings)
http://www.objectmentor.com
http://www.polyglotprogramming.com
http://www.aspectprogramming.com
http://aquarium.rubyforge.org
http://www.contract4j.org

David Masover wrote:

···

On Wednesday 15 April 2009 11:16:48 Mohit Sindhwani wrote:

To extend this to my next level of use, I would like to use it for
creating Word documents. Does anyone know any scripts/ ideas on
creating Word documents using TexTile?

Don't! That's the best advice I can give.

If you must, probably the simplest place to start is creating an ODF document, or something else Word can read -- in the case of ODF, you may have to have OpenOffice read it, then save as Word.

What about rtf? there are some projects (but cannot vouch for them):

http://www.google.com/search?hl=en&q=ruby+rtf&btnG=Google+Search

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

David Masover wrote:

  

To extend this to my next level of use, I would like to use it for
creating Word documents. Does anyone know any scripts/ ideas on
creating Word documents using TexTile?
    
Don't! That's the best advice I can give.

If you must, probably the simplest place to start is creating an ODF document, or something else Word can read -- in the case of ODF, you may have to have OpenOffice read it, then save as Word.

But I'm really curious to know why you need this? If it's just for them to be nicely styled, it would be much faster to learn CSS -- or, if that's not an option, to perhaps convert the word template to an HTML template.
  
HI David and Joel

Thanks for replying.

I haven't tried creating an ODF document yet, but I did find that importing the HTML into OpenOffice works partly. Not all my styles are retained, but at least the headings are maintained. But the process is cumbersome. Writer doesn't allow you to open an HTML file and save it as DOC. You need to first export it to ODT. Then open the ODT in Writer and save as Word Doc. When you open the resulting DOC in Word, quite a bit of the formatting is lost and some of the styling is gone.

Opening the HTML directly in Word also works, but it retains styling but completely loses the semantics (headings, code blocks, etc.) - everything is directly formatted styled paragraphs.

The reason for wanting to get to Word... well, I do know CSS and use that for the causal first print from what I write. But I think Word (or some other publishing software, not sure which one) would give me better control over controlling the print output. For example, including page-wise headers and footers, and so on. It's partly a curiosity and partly a need :slight_smile:

Joel: I'll try RTF to see if that helps.

Cheers,
Mohit.
4/16/2009 | 10:09 AM.

···

On Wednesday 15 April 2009 11:16:48 Mohit Sindhwani wrote:

Dean Wampler wrote:

You might look at Asciidoc. I don't know if it supports Word generation,
however.
  

Will do! It's shaping up to be a busy weekend on thsi :slight_smile:

Cheers,
Mohit.
4/17/2009 | 1:19 AM.

I haven't tried creating an ODF document yet, but I did find that
importing the HTML into OpenOffice works partly.

That isn't what I was suggesting, though it is one way.

I was suggesting that you rip open Textile, or write your own Textile parser,
or even work with the Textile-generated HTML, and write a script that
generates an ODT.

I was mostly suggesting this to discourage you from trying that approach.
There isn't a Textile-specific way of doing this, to my knowledge, and there
wouldn't likely be a good, generic way of doing it with HTML -- at best, you
could have something take a reference to a CSS class and replace it with a
reference to a given ODF (or Word) style, but you'd probably have to recreate
those styles in the word processor -- I don't know of anything that can take
CSS and generate corresponding ODF (or Word) styles.

The reason for wanting to get to Word... well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output.

CSS gives a fair amount of control. It's possible Word and OpenOffice provide
more, but not much.

For example, including
page-wise headers and footers, and so on.

A quick Google finds this:

http://css-discuss.incutio.com/?page=PrintStylesheets
http://css-discuss.incutio.com/?page=PrintingHeaders
http://www.xefteri.com/articles/show.cfm?id=26
http://www.alistapart.com/articles/boom

...and so on.

A quote from that last article: "It is now possible, even feasible, to use
HTML as the document format for books."

···

On Wednesday 15 April 2009 21:09:20 Mohit Sindhwani wrote:

* Mohit Sindhwani <mo_mail@onghu.com> [2009-04-16 11:09:20 +0900]:

The reason for wanting to get to Word... well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output.

hmmm... what about some form of tex? Sometime ago, I tried some collaborative
document preparation in the office with the help of a Wiki to create our
annual report (PDF file). I used a tex macro package called ConTeXt.
My experience can be found at:

http://wiki.contextgarden.net/HTML_and_ConTeXt

However one needs to learn a bit of ConTeXt, mostly how to customize
the style part. It is relatively easy to translate html tags to ConTeXt
tags (say h1 - > section; h2 -> subsection etc..).

Another alternative is maruku (http://maruku.rubyforge.org) which can
create LaTeX output and from that pdf.

cheers!
saji

···

--
Saji N. Hameed

APEC Climate Center
1463 U-dong, Haeundae-gu, +82 51 745 3951
BUSAN 612-020, KOREA saji@apcc21.net
Fax: +82-51-745-3999

David Masover wrote:

I was suggesting that you rip open Textile, or write your own Textile parser, or even work with the Textile-generated HTML, and write a script that generates an ODT.
  
Ya! I think I may need to look at ripping open the parser and adding Word (or ODT) to the parser (if I remain stubborn enough to do this). I am less keen to do this... yet!

I was mostly suggesting this to discourage you from trying that approach. There isn't a Textile-specific way of doing this, to my knowledge, and there wouldn't likely be a good, generic way of doing it with HTML -- at best, you could have something take a reference to a CSS class and replace it with a reference to a given ODF (or Word) style, but you'd probably have to recreate those styles in the word processor -- I don't know of anything that can take CSS and generate corresponding ODF (or Word) styles.
  
I don't care about styles being generated from my CSS. I'm happy enough if the document retains the semantics of being different types of sections. I don't mind creating the styles again in the Word/ ODF software. Even in my TexTile -> HTML journey, I have a set of styles that I include into the HTML so that it works together. I expect that creating/ customizing the style would be a one-off effort - after that, it's in a template and when I create a new document, I would just apply the template to it!

The reason for wanting to get to Word... well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output.
    
CSS gives a fair amount of control. It's possible Word and OpenOffice provide more, but not much.
  
I need to see the links below! I am aware of print style sheets and use that quite a bit for my websites (mostly hosted on Radiant). But, I'm keen to generate some "office" documents from the musings on my Palm phone when I'm out (I find the Documents To Go software not so great.. and not so clean).

For example, including
page-wise headers and footers, and so on.
    
A quick Google finds this:

http://css-discuss.incutio.com/?page=PrintStylesheets
http://css-discuss.incutio.com/?page=PrintingHeaders
http://www.xefteri.com/articles/show.cfm?id=26
http://www.alistapart.com/articles/boom

...and so on.

A quote from that last article: "It is now possible, even feasible, to use HTML as the document format for books."
  
That last quote is fantastic! I'm actually kind of writing a book. But I'm not sure if I will complete it. If I don't complete it, I hope to release the material on one of my websites. That's why working in TexTile is so attractive. It's already ready for the (Radiant) website if need be. If I could get to Word, it would open up other applications for me.

Cheers,
Mohit.
4/16/2009 | 11:25 AM.

Hi Saji!

hmmm... what about some form of tex? Sometime ago, I tried some collaborative
document preparation in the office with the help of a Wiki to create our
annual report (PDF file). I used a tex macro package called ConTeXt. My experience can be found at:

http://wiki.contextgarden.net/HTML_and_ConTeXt
  
I shall take a look and see if that helps. I do love the fact that TexTile is so easy to use :slight_smile:

However one needs to learn a bit of ConTeXt, mostly how to customize
the style part. It is relatively easy to translate html tags to ConTeXt
tags (say h1 - > section; h2 -> subsection etc..).

Another alternative is maruku (http://maruku.rubyforge.org) which can
create LaTeX output and from that pdf.

Again, I'll take a look. I have written a script that can convert a basic Word document to TexTile (for a Radiant site) which I insert into my website using your methods! I'll try to see what works from here.

Thanks for replying.
Cheers,
Mohit.
4/16/2009 | 11:26 AM.

David Masover wrote:
> I was suggesting that you rip open Textile, or write your own Textile
> parser, or even work with the Textile-generated HTML, and write a script
> that generates an ODT.

Ya! I think I may need to look at ripping open the parser and adding
Word (or ODT) to the parser (if I remain stubborn enough to do this). I
am less keen to do this... yet!

Now that I think of it, it's probably simpler to read the Textile-generated
HTML. But either way, you'll have to deal with an office format, which isn't
going to be fun.

I don't care about styles being generated from my CSS. I'm happy enough
if the document retains the semantics of being different types of
sections. I don't mind creating the styles again in the Word/ ODF
software.

In that case, it's probably not too difficult. Still harder than adding a print
mode to CSS, but feasible.

I'll strongly suggest ODF if you go that route, even if you're targeting word,
unless you have a _very_ good Word library. The reason is simple: Last I
checked, the ODF spec is 600 pages. The Microsoft OpenXML spec is 6000 pages,
and is incomplete. On a more subjective level, ODF XML is actually reasonably
readable, while OpenXML is not. I'd much rather let a tool like OpenOffice, or
the OpenDocument plugin for Word, handle that for me, rather than trying to
deal with OpenXML.

That last quote is fantastic! I'm actually kind of writing a book.

[snip]

If I could get to Word, it would open up other applications
for me.

Maybe. It's possible Word does something CSS doesn't, here.

What I'm suggesting is that plain old HTML/CSS will probably give you what you
need for styling, even for print media, without having to use a word
processor. If you can do it with CSS, it will be easier, more portable, and
likely more future-proof than trying to do it with a word processor.

···

On Wednesday 15 April 2009 22:25:08 Mohit Sindhwani wrote:

HI David

Thanks for your replies.

I'm not averse to working in HTML :slight_smile: I do know the full benefits of a future-ready text based format. In fact, that's one of the reasons that I like TexTile also. I was probing to see if there was a nice enough way to go to Word. I don't think I'm going to consider generating a Word document based on TexTile. Working through the Word spec will probably be difficult enough! If at all I go that way, I may consider using win32ole to get Word to generate that document for me based on parsing Textile.

David Masover wrote:

Ya! I think I may need to look at ripping open the parser and adding
Word (or ODT) to the parser (if I remain stubborn enough to do this). I
am less keen to do this... yet!
    
Now that I think of it, it's probably simpler to read the Textile-generated HTML. But either way, you'll have to deal with an office format, which isn't going to be fun.
  
Yes! that is absolutely correct.

I don't care about styles being generated from my CSS. I'm happy enough
if the document retains the semantics of being different types of
sections. I don't mind creating the styles again in the Word/ ODF
software.
    
In that case, it's probably not too difficult. Still harder than adding a print mode to CSS, but feasible.
  
I already have a print mode for the CSS. With CSS3, it seems I can add even more. That's what I'm using right now (CSS2).

I'll strongly suggest ODF if you go that route, even if you're targeting word, unless you have a _very_ good Word library. The reason is simple: Last I checked, the ODF spec is 600 pages. The Microsoft OpenXML spec is 6000 pages, and is incomplete. On a more subjective level, ODF XML is actually reasonably readable, while OpenXML is not. I'd much rather let a tool like OpenOffice, or the OpenDocument plugin for Word, handle that for me, rather than trying to deal with OpenXML.
  
You make a good case here!

What I'm suggesting is that plain old HTML/CSS will probably give you what you need for styling, even for print media, without having to use a word processor. If you can do it with CSS, it will be easier, more portable, and likely more future-proof than trying to do it with a word processor.
  
Understood... and agreed!

Cheers,
Mohit.
4/17/2009 | 1:18 AM.