Dear all,
Could anyone explain how to do convert PDF to text format.
Thanks in advance
Regards,
Jose Martin
···
--
Posted via http://www.ruby-forum.com/.
Dear all,
Could anyone explain how to do convert PDF to text format.
Thanks in advance
Regards,
Jose Martin
--
Posted via http://www.ruby-forum.com/.
-------- Original-Nachricht --------
Datum: Mon, 11 Aug 2008 18:41:51 +0900
Von: dare ruby <martin@angleritech.com>
An: ruby-talk@ruby-lang.org
Betreff: PDF to text covertor?
Dear all,
Could anyone explain how to do convert PDF to text format.
Thanks in advance
Regards,
Jose Martin
--
Posted via http://www.ruby-forum.com/\.
Dear Jose,
it depends on whether your PDF actually contains text or just images that a human can recognize as
text.
In the first case, you can try using tools like pdftotext (http://en.wikipedia.org/wiki/Pdftotext\), on Linux and
Mac, at least. On Windows, there are also some pdf viewers where you can say , "Save as text" .
In the second case, you'll have to use an OCR (optical character recognition) software. There are some
good commercial ones available. I've liked ABBYY's Finereader (on Windows).
Best regards,
Axel
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: GMX Browser - verwenden Sie immer einen aktuellen Browser. Kostenloser Download.
Hi,
In <59a3f50dc89e69c5250b753986657c78@ruby-forum.com>
"PDF to text covertor?" on Mon, 11 Aug 2008 18:41:51 +0900,
Could anyone explain how to do convert PDF to text format.
It seems that Ruby/Poppler(*1), the Ruby bindings of
Poppler(*2), is what you're looking for.
Ruby-GNOME 2 download | SourceForge.net
(*1) http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler
(*2) http://poppler.freedesktop.org/
pdftotext is a bundled application in Poppler.
Thanks,
dare ruby <martin@angleritech.com> wrote:
--
kou
I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?
Thanks in advance
Regards,
Jose Martin
--
Posted via http://www.ruby-forum.com/.
Your best bet is a ruby script that calls out to xpdf to do the actual
pdf->text conversion, then parses the text. There's a windows port of
the xpdf command line utilities.
http://www.perlmonks.org/?node_id=298041
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/
martin
On Mon, Aug 18, 2008 at 11:10 PM, dare ruby <martin@angleritech.com> wrote:
I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?