I'm looking for an example of parsing pdf. I tried to implement this
with ruby
and docsplit gem, but it uses an external tool to extract the text, and
there are problems with number references, and you have to parse the
text file according to the regular expressions
I want to parse some papers in pdf format, to extract it's title,
keywords, authors, authors's mails, institutions, etc.
I'm looking for some experience ruby developer with a better way to do
this without parsing a textfile through regular expressions
Regular Expressions are pretty much the standard way of parsing text files,
aren't they? Certainly they're what I've been using for years now.
What's the problem you're having with them?
···
On Mon, May 9, 2011 at 11:32 AM, Felipe Espinoza <fespinozacast@gmail.com>wrote:
Hi,
I'm looking for an example of parsing pdf. I tried to implement this
with ruby
and docsplit gem, but it uses an external tool to extract the text, and
there are problems with number references, and you have to parse the
text file according to the regular expressions
I want to parse some papers in pdf format, to extract it's title,
keywords, authors, authors's mails, institutions, etc.
I'm looking for some experience ruby developer with a better way to do
this without parsing a textfile through regular expressions
2011/5/9 Felipe Espinoza <fespinozacast@gmail.com>:
Hi,
I'm looking for an example of parsing pdf. I tried to implement this
with ruby
and docsplit gem, but it uses an external tool to extract the text, and
there are problems with number references, and you have to parse the
text file according to the regular expressions
I want to parse some papers in pdf format, to extract it's title,
keywords, authors, authors's mails, institutions, etc.
I'm looking for some experience ruby developer with a better way to do
this without parsing a textfile through regular expressions
Whenever I've done this in the past, I've used pdftohtml to produce an
HTML file which Nokogiri can then handle. Yes, it's an external tool,
but it's been reliable for me in the past.
I could have excerpted parts of the binary blob this PDF includes at
the start, but I rather not break anyone's email client without
intending to.
···
On Mon, May 9, 2011 at 8:01 PM, James <oscartheduck@gmail.com> wrote:
--
Phillip Gawlowski
Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.