I notice that Ruby has lots of tools for creating PDF files, are there any
that let you extract text from a PDF file?
_Kevin
I notice that Ruby has lots of tools for creating PDF files, are there any
that let you extract text from a PDF file?
_Kevin
Not yet. PDF::Writer will be refactored a little bit for version 2.0
(coming out later this year) so that it will be three separate
components: PDF::Core (the core objects representing a PDF object in
memory, as well as rendering), PDF::Writer (the writer/layout code),
and PDF::Reader (read a PDF object into an in-memory representation).
Much of the code to do PDF::Core is already in place (it's currently
called PDF::Writer::Object or PDF::Writer::Objects), but there's
nothing explicitly present to represent this.
PDF::Reader will probably be released in early 2006, depending on how
long it takes to refactor the code that already exists, properly
extend it, and get the necessary PDF::Writer code finished.
-austin
On 8/13/05, Kevin Olbrich <kevin.olbrich@duke.edu> wrote:
I notice that Ruby has lots of tools for creating PDF files, are there any
that let you extract text from a PDF file?
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
I once wrote a Ruby PDF Text extractor while workin at ywesee.
I tought they released it on rubyforge but I can't find it anymore.
perhaps if you contact them they can help you.
www.ywesee.com
Greetings
Andy
Kevin Olbrich wrote:
I notice that Ruby has lots of tools for creating PDF files, are there any
that let you extract text from a PDF file?_Kevin
Thanks, I'll keep my eyes open for it.
_Kevin
-----Original Message-----
From: Austin Ziegler [mailto:halostatue@gmail.com]
Sent: Saturday, August 13, 2005 01:45 PM
To: ruby-talk ML
Subject: Re: Ruby PDF text extractor
On 8/13/05, Kevin Olbrich <kevin.olbrich@duke.edu> wrote:
I notice that Ruby has lots of tools for creating PDF files, are there
any that let you extract text from a PDF file?
Not yet. PDF::Writer will be refactored a little bit for version 2.0 (coming
out later this year) so that it will be three separate
components: PDF::Core (the core objects representing a PDF object in memory,
as well as rendering), PDF::Writer (the writer/layout code), and PDF::Reader
(read a PDF object into an in-memory representation). Much of the code to do
PDF::Core is already in place (it's currently called PDF::Writer::Object or
PDF::Writer::Objects), but there's nothing explicitly present to represent
this.
PDF::Reader will probably be released in early 2006, depending on how long
it takes to refactor the code that already exists, properly extend it, and
get the necessary PDF::Writer code finished.
-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
I'd be interested in helping with this.
martin
Austin Ziegler <halostatue@gmail.com> wrote:
PDF::Reader will probably be released in early 2006, depending on how
long it takes to refactor the code that already exists, properly
extend it, and get the necessary PDF::Writer code finished.