I have a word 2002 document with a table in it. Is there a way of using ruby
to extract the table(5 fields) and then parse the content of the table and
spit it out to an XML file.
Thanks
I have a word 2002 document with a table in it. Is there a way of using ruby
to extract the table(5 fields) and then parse the content of the table and
spit it out to an XML file.
Thanks
Useko Netsumi wrote:
I have a word 2002 document with a table in it. Is there a way of using ruby
to extract the table(5 fields) and then parse the content of the table and
spit it out to an XML file.
Well, the short answer is “Yes.” You can use WIN32OLE to create an
instance of Word, then script it as you might using VBA.
require ‘win32ole’
application = WIN32OLE.new(‘Word.Application’)
See http://homepage1.nifty.com/markey/ruby/win32ole/index_e.html
(WIN32OLE is also part of Ruby 1.8)
You’ll need a decent reference to the Word DOM; I believe you should be
able to get that on msdn.microsoft.com.
Back in another life I wrote a bunch of VB/VBA that took a Word 2000
doc, walked the object model, and spit out XML.
I have no idea how much of the Word 2000 DOM has carried over to Word
2002. I seem to recall that there was a collection of tables, and then
some API2 for walking the rows and fields.
If you think it might help, the source code can be found at
http://www.jamesbritt.com/code/ProVb6XmlBookCode.zip
That zip holds a bunch of other zip files. Look at Word2Xml.zip, and
ConvertToXml.dot for the macros that call inot VB code.
(Never thought that would be of any relevance here … )
James
Thanks
.
Hi!
I have a word 2002 document with a table in it. Is there a way of
using ruby to extract the table(5 fields) and then parse the
content of the table and spit it out to an XML file.
I don’t use Word 2002 (perhaps it is not available for Linux?) but I
assume it is some kind of text processor that does not use XML as its
internal format. The main problem is the format used, so the best
idea seems to be converting data to something useful:
copy table, past it into a document of its own, export to csv
same, but export to some kind of SGML or XML (HTML for example)
if ‘Word 2002’ happens to stand for ‘Microsoft Word 2002’ you may
try ‘antiword’
Josef ‘Jupp’ SCHUGT
–
http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge