Scraping off a Word document?

Here's a conceptual question. I have a Word mail merge, with a few
dozen documents. There's a certain field, let's call in the Employee
field, on each page. These documents are sorted in order of this
field. What I'd like to do is save off each group of pages into its
own Word document under that field. So if it's Employee: Joe Schmoe on
the first five pages I'd want to save off just those pages and name
the file "Joe Schmoe.Doc" and so on.

The mail merge itself is pretty much hard-coded into a big group of
documents, so that's my basis. Any suggestions about what Ruby modules
and methods I'd start out delving into? I'm thinking win32ole of
course, but have a good-sized task ahead of me I have to deliver in
relatively short order :-/

"gregarican" <greg.kujawa@gmail.com> wrote in message news:a7d21393-9d0b-4a90-9b0b-e58349e911b5@e10g2000vbe.googlegroups.com...

Here's a conceptual question. I have a Word mail merge, with a few
dozen documents. There's a certain field, let's call in the Employee
field, on each page. These documents are sorted in order of this
field. What I'd like to do is save off each group of pages into its
own Word document under that field. So if it's Employee: Joe Schmoe on
the first five pages I'd want to save off just those pages and name
the file "Joe Schmoe.Doc" and so on.

The mail merge itself is pretty much hard-coded into a big group of
documents, so that's my basis. Any suggestions about what Ruby modules
and methods I'd start out delving into? I'm thinking win32ole of
course, but have a good-sized task ahead of me I have to deliver in
relatively short order :-/

I'm not sure you can do what you want to do.

You open a Word doc, then want to save a portion based on the Employee Name to it's own file? Then, after that Save is finished, keep the file open, advance the database to the next Employee Name and repeat the save, then repeat the entire process until you get through all of the Employee Names?

It occurs to me that Word can't do that task because the Employee Name field in the document is an unknown until the actual time of the merge. There is only one DOC file for any given letter, and when I do these kinds of merge all I get to see is <fieldname> where the variables are that get filled in during the merge. You have to fill the variable from the database then save the result, advance the database to the next record and fill the variable again to save that result.

You are going to create a file for each employee for each letter, and this seems to me to defeat the whole reason to merge data into a document. The reason I merge is because I want one file for everybody, I specifically do not want a separate file for each person.

gregarican wrote:

Here's a conceptual question. I have a Word mail merge, with a few
dozen documents. There's a certain field, let's call in the Employee
field, on each page. These documents are sorted in order of this
field. What I'd like to do is save off each group of pages into its
own Word document under that field. So if it's Employee: Joe Schmoe on
the first five pages I'd want to save off just those pages and name
the file "Joe Schmoe.Doc" and so on.

The mail merge itself is pretty much hard-coded into a big group of
documents, so that's my basis. Any suggestions about what Ruby modules
and methods I'd start out delving into? I'm thinking win32ole of
course, but have a good-sized task ahead of me I have to deliver in
relatively short order :-/

Write what you need using the VBA built into Word. Intellisense will make that rather easy.

Then either replicate your VBA calls using Ruby's win32ole...

...or just shell directly from Ruby to your VBA!

I'd do it with VB from inside Word. An alternative might be to use OpenOffice, read the word, write OO's format (XML in ZIP) and the manipulate the XML. But this sounds pretty awkward.

Can't you force the mail merge to produce multiple documents?

Cheers

  robert

···

On 16.01.2009 16:51, gregarican wrote:

Here's a conceptual question. I have a Word mail merge, with a few
dozen documents. There's a certain field, let's call in the Employee
field, on each page. These documents are sorted in order of this
field. What I'd like to do is save off each group of pages into its
own Word document under that field. So if it's Employee: Joe Schmoe on
the first five pages I'd want to save off just those pages and name
the file "Joe Schmoe.Doc" and so on.

The mail merge itself is pretty much hard-coded into a big group of
documents, so that's my basis. Any suggestions about what Ruby modules
and methods I'd start out delving into? I'm thinking win32ole of
course, but have a good-sized task ahead of me I have to deliver in
relatively short order :-/

--
remember.guy do |as, often| as.you_can - without end

Robert Klemme wrote:

I'd do it with VB from inside Word. An alternative might be to use OpenOffice, read the word, write OO's format (XML in ZIP) and the manipulate the XML. But this sounds pretty awkward.

I suspect Word can also barf out an XML representation.

It may be awkward (get ready for the horror when you open that file!), but it's probably the best way. All word processing is heading towards XML for its interoperability.

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :slight_smile:

···

On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:

Robert Klemme wrote:
> I'd do it with VB from inside Word. An alternative might be to use
> OpenOffice, read the word, write OO's format (XML in ZIP) and the
> manipulate the XML. But this sounds pretty awkward.

I suspect Word can also barf out an XML representation.

It may be awkward (get ready for the horror when you open that file!), but it's
probably the best way. All word processing is heading towards XML for its
interoperability.

"gregarican" <greg.kujawa@gmail.com> wrote in message news:c03df6e7-e4f6-4f12-9aa7-bf03921455b4@o4g2000pra.googlegroups.com...

Robert Klemme wrote:
> I'd do it with VB from inside Word. An alternative might be to use
> OpenOffice, read the word, write OO's format (XML in ZIP) and the
> manipulate the XML. But this sounds pretty awkward.

I suspect Word can also barf out an XML representation.

It may be awkward (get ready for the horror when you open that file!), but it's
probably the best way. All word processing is heading towards XML for its
interoperability.

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :slight_smile:

<JS>
Well, if anybody can figure this out, it's you.

</JS>

···

On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:

Robert Klemme wrote:

I'd do it with VB from inside Word. An alternative might be to use
OpenOffice, read the word, write OO's format (XML in ZIP) and the
manipulate the XML. But this sounds pretty awkward.

I suspect Word can also barf out an XML representation.

It may be awkward (get ready for the horror when you open that file!), but it's
probably the best way. All word processing is heading towards XML for its
interoperability.

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

I'd say that's pretty fast. Good job!

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :slight_smile:

May I suggest a different approach? Since your primary step is pulling data from a relational DB using Ruby, you could as well do this: open the mail merge Word template, replace mail merge fields with text with special formatting (for example "<<<field name>>>" or whatever doesn't collide with RTF meta sequences). Then you save this as RTF file (ASCII readable). Now you only need to read in the mail template file from Ruby, do all the replacements and then write it out in Ruby again once for each record. Sounds pretty simple IMHO.

Kind regards

  robert

···

On 18.01.2009 03:18, gregarican wrote:

On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end

Robert Klemme wrote:

···

On 18.01.2009 03:18, gregarican wrote:

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

I'd say that's pretty fast. Good job!

in a single language. But that's another fight for another day :slight_smile:

May I suggest a different approach? Since your primary step is pulling
data from a relational DB using Ruby, you could as well do this: open
the mail merge Word template, replace mail merge fields with text with
special formatting (for example "<<<field name>>>" or whatever doesn't
collide with RTF meta sequences). Then you save this as RTF file (ASCII
readable). Now you only need to read in the mail template file from
Ruby, do all the replacements and then write it out in Ruby again once
for each record. Sounds pretty simple IMHO.

Kind regards

  robert

FYI, a similar (though not necessarily better) solution using Find &
Replace in Word is demonstrated here:

  Ruby on Windows: Find & Replace with MS Word

Greg: If you're willing to share your C# code for automating Word, I,
for one, would like to see it. Feel free to email me, if you like.

David

--
Posted via http://www.ruby-forum.com/\.