Parsing pdf files

hello all,
                 Does anyone know a good pdf parser that retains formatting
after its extracted text? I used PDF::Reader, but when you extract text you
just get a stream of characters that are not at all intelligible. When I
copy a pdf contents from a pdf reader to Gedit text editor in linux it
retains its format. I'm looking for something like that.

Thanks for any help.

regards,
Arun Kumar M S

···

--

श्री जानकीरघुनाथो विजयते ||

This doesn't exist in Ruby, unfortunately.

-greg

···

On Sat, Aug 22, 2009 at 12:33 PM, Arun Kumar<arun.einstein@gmail.com> wrote:

hello all,
Does anyone know a good pdf parser that retains formatting
after its extracted text? I used PDF::Reader, but when you extract text you
just get a stream of characters that are not at all intelligible. When I
copy a pdf contents from a pdf reader to Gedit text editor in linux it
retains its format. I'm looking for something like that.

Very interesting, thanks for posting this.

-greg

···

On Mon, Aug 24, 2009 at 6:25 AM, Erik Terpstra<erik@ruby-lang.nl> wrote:

You can use http://pdftohtml.sourceforge.net or use my Ruby wrapper for
this tool:

GitHub - eterps/pdf-struct: PDF::Extractor is a library that provides high level access to the text objects of a PDF document

That's really very sad :frowning:

···

On Sat, Aug 22, 2009 at 10:33 PM, Gregory Brown <gregory.t.brown@gmail.com>wrote:

On Sat, Aug 22, 2009 at 12:33 PM, Arun Kumar<arun.einstein@gmail.com> > wrote:
> hello all,
> Does anyone know a good pdf parser that retains
formatting
> after its extracted text? I used PDF::Reader, but when you extract text
you
> just get a stream of characters that are not at all intelligible. When I
> copy a pdf contents from a pdf reader to Gedit text editor in linux it
> retains its format. I'm looking for something like that.

This doesn't exist in Ruby, unfortunately.

-greg

--

श्री जानकीरघुनाथो विजयते ||

Looks like you better roll up your sleeves :slight_smile:

···

On Sat, Aug 22, 2009 at 4:10 PM, Arun Kumar<arun.einstein@gmail.com> wrote:

That's really very sad :frowning:

Yeah seeing what can be done :slight_smile:

···

On Sun, Aug 23, 2009 at 2:41 AM, Gregory Brown <gregory.t.brown@gmail.com>wrote:

On Sat, Aug 22, 2009 at 4:10 PM, Arun Kumar<arun.einstein@gmail.com> > wrote:
> That's really very sad :frowning:

Looks like you better roll up your sleeves :slight_smile:

--

श्री जानकीरघुनाथो विजयते ||