PDF to text covertor?

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin

···

--
Posted via http://www.ruby-forum.com/.

-------- Original-Nachricht --------

Datum: Mon, 11 Aug 2008 18:41:51 +0900
Von: dare ruby <martin@angleritech.com>
An: ruby-talk@ruby-lang.org
Betreff: PDF to text covertor?

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin
--
Posted via http://www.ruby-forum.com/\.

Dear Jose,

it depends on whether your PDF actually contains text or just images that a human can recognize as
text.
In the first case, you can try using tools like pdftotext (http://en.wikipedia.org/wiki/Pdftotext\), on Linux and
Mac, at least. On Windows, there are also some pdf viewers where you can say , "Save as text" .

In the second case, you'll have to use an OCR (optical character recognition) software. There are some
good commercial ones available. I've liked ABBYY's Finereader (on Windows).

Best regards,

Axel

···

--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: GMX Browser - verwenden Sie immer einen aktuellen Browser. Kostenloser Download.

Hi,

In <59a3f50dc89e69c5250b753986657c78@ruby-forum.com>
  "PDF to text covertor?" on Mon, 11 Aug 2008 18:41:51 +0900,

Could anyone explain how to do convert PDF to text format.

It seems that Ruby/Poppler(*1), the Ruby bindings of
Poppler(*2), is what you're looking for.
  Ruby-GNOME 2 download | SourceForge.net

(*1) http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler
(*2) http://poppler.freedesktop.org/

pdftotext is a bundled application in Poppler.

Thanks,

···

dare ruby <martin@angleritech.com> wrote:
--
kou

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Thanks in advance

Regards,
Jose Martin

···

--
Posted via http://www.ruby-forum.com/.

Your best bet is a ruby script that calls out to xpdf to do the actual
pdf->text conversion, then parses the text. There's a windows port of
the xpdf command line utilities.

http://www.perlmonks.org/?node_id=298041
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

martin

···

On Mon, Aug 18, 2008 at 11:10 PM, dare ruby <martin@angleritech.com> wrote:

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?