hello all,
Does anyone know a good pdf parser that retains formatting
after its extracted text? I used PDF::Reader, but when you extract text you
just get a stream of characters that are not at all intelligible. When I
copy a pdf contents from a pdf reader to Gedit text editor in linux it
retains its format. I'm looking for something like that.
Thanks for any help.
regards,
Arun Kumar M S
···
--
श्री जानकीरघुनाथो विजयते ||
This doesn't exist in Ruby, unfortunately.
-greg
···
On Sat, Aug 22, 2009 at 12:33 PM, Arun Kumar<arun.einstein@gmail.com> wrote:
hello all,
Does anyone know a good pdf parser that retains formatting
after its extracted text? I used PDF::Reader, but when you extract text you
just get a stream of characters that are not at all intelligible. When I
copy a pdf contents from a pdf reader to Gedit text editor in linux it
retains its format. I'm looking for something like that.
Very interesting, thanks for posting this.
-greg
···
On Mon, Aug 24, 2009 at 6:25 AM, Erik Terpstra<erik@ruby-lang.nl> wrote:
You can use http://pdftohtml.sourceforge.net or use my Ruby wrapper for
this tool:
GitHub - eterps/pdf-struct: PDF::Extractor is a library that provides high level access to the text objects of a PDF document
Looks like you better roll up your sleeves
···
On Sat, Aug 22, 2009 at 4:10 PM, Arun Kumar<arun.einstein@gmail.com> wrote:
That's really very sad
Yeah seeing what can be done
···
On Sun, Aug 23, 2009 at 2:41 AM, Gregory Brown <gregory.t.brown@gmail.com>wrote:
On Sat, Aug 22, 2009 at 4:10 PM, Arun Kumar<arun.einstein@gmail.com> > wrote:
> That's really very sad
Looks like you better roll up your sleeves
--
श्री जानकीरघुनाथो विजयते ||