PDF to text covertor?

Martin_Durai · 11 August 2008 09:41

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin

···

--
Posted via http://www.ruby-forum.com/.

Axel_Etzold · 11 August 2008 10:03

-------- Original-Nachricht --------

Datum: Mon, 11 Aug 2008 18:41:51 +0900
Von: dare ruby <martin@angleritech.com>
An: ruby-talk@ruby-lang.org
Betreff: PDF to text covertor?

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin
--
Posted via http://www.ruby-forum.com/\.

Dear Jose,

it depends on whether your PDF actually contains text or just images that a human can recognize as
text.
In the first case, you can try using tools like pdftotext (http://en.wikipedia.org/wiki/Pdftotext\), on Linux and
Mac, at least. On Windows, there are also some pdf viewers where you can say , "Save as text" .

In the second case, you'll have to use an OCR (optical character recognition) software. There are some
good commercial ones available. I've liked ABBYY's Finereader (on Windows).

Best regards,

Axel

···

--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: GMX Browser - verwenden Sie immer einen aktuellen Browser. Kostenloser Download.

Kouhei_Sutou1 · 11 August 2008 11:08

Hi,

In <59a3f50dc89e69c5250b753986657c78@ruby-forum.com>
"PDF to text covertor?" on Mon, 11 Aug 2008 18:41:51 +0900,

Could anyone explain how to do convert PDF to text format.

It seems that Ruby/Poppler(*1), the Ruby bindings of
Poppler(*2), is what you're looking for.
Ruby-GNOME 2 download | SourceForge.net

(*1) http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler
(*2) http://poppler.freedesktop.org/

pdftotext is a bundled application in Poppler.

Thanks,

···

dare ruby <martin@angleritech.com> wrote:
--
kou

Martin_Durai · 19 August 2008 06:10

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Thanks in advance

Regards,
Jose Martin

···

--
Posted via http://www.ruby-forum.com/.

Martin_DeMello · 19 August 2008 17:39

Your best bet is a ruby script that calls out to xpdf to do the actual
pdf->text conversion, then parses the text. There's a windows port of
the xpdf command line utilities.

http://www.perlmonks.org/?node_id=298041
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

martin

···

On Mon, Aug 18, 2008 at 11:10 PM, dare ruby <martin@angleritech.com> wrote:

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Topic		Replies	Views
Extract Text from PDF ruby-talk	5	76	13 April 2007
%w[RTF DOC PDF] converter to TXT ruby-talk	2	123	8 August 2009
Convert .doc to pdf in ruby ruby-talk	9	2478	28 August 2013
Ruby PDF text extractor ruby-talk	4	158	17 August 2005
MS Word files and PDFs ruby-talk	0	113	24 April 2006

PDF to text covertor?

Related topics