I use pdf-reader for several of my scripts. If you're after just reading the PDF, then that's the one I'd stick with. Is there something in particular you don't understand?
I use pdf-reader for several of my scripts. If you're after just reading
the PDF, then that's the one I'd stick with. Is there something in
particular you don't understand?
Wayne
Thanks Wayne. I have some requirement, I need to script it. I am new to
this gem. Looking to its examples. I never used it before. Today, I
found it online. I want to collect all the data from it to a CSV file.
I use pdf-reader for several of my scripts. If you're after just reading
the PDF, then that's the one I'd stick with. Is there something in
particular you don't understand?
@Wayne - Is pdf file holds any xml objects internally of the data it is
displaying ? If so, then I can use Nokogiri to parse this.
I'm not sure I fully understand... You want to read all the data out of a PDF (or just selected data?) and put all that data into a CSV file?
Here's a sample that may be easy for you to understand. This simply goes through a PDF and searches for a preset word or phrase and lists the page on which it was found in the PDF.
File.open(@thepdffile, "rb") do |io| -- Open file
reader = PDF::Reader.new(io) -- reader now contains full contents of pdf
@counter=0
reader.pages.each do |page| -- since a pdf is defined in pages you have to go through each page to get the content
@counter+=1
pageText = page.text -- pageText contains all the text on a single page (only text!)
@wordlist\.each do |singleword| \(bunch of stuff specific to my script\)\. But hopefully this example helps\.
singleword\.strip\!
if pageText\.include? singleword
@indv\_word << singleword
@indv\_page << @counter
end
end
end
end
···
________________________________
From: Arup Rakshit <lists@ruby-forum.com>
To: ruby-talk@ruby-lang.org
Sent: Wednesday, March 5, 2014 11:59 AM
Subject: Re: PDF reader gems
Wayne Brisette wrote in post #1138922:
Arup:
I use pdf-reader for several of my scripts. If you're after just reading
the PDF, then that's the one I'd stick with. Is there something in
particular you don't understand?
Wayne
Thanks Wayne. I have some requirement, I need to script it. I am new to
this gem. Looking to its examples. I never used it before. Today, I
found it online. I want to collect all the data from it to a CSV file.
I'm not sure I fully understand... You want to read all the data out of
a PDF (or just selected data?) and put all that data into a CSV file?
Here's a sample that may be easy for you to understand. This simply goes
through a PDF and searches for a preset word or phrase and lists the
page on which it was found in the PDF.
It would really be helpfull. I would start my script tonight. If I have
any issue to understand it, I will ask you here in this list.
I'm not sure I fully understand... You want to read all the data out of
a PDF (or just selected data?) and put all that data into a CSV file?
Here's a sample that may be easy for you to understand. This simply goes
through a PDF and searches for a preset word or phrase and lists the
page on which it was found in the PDF.
I wrote the code below :
require 'pdf/reader'
File.open("#{__dir__}/a.pdf",'rb') do |io|
reader = PDF::Reader.new(io)
reader.pages.each do |page|
puts page.text
end
end
It is working. But `text` gives whole page content at a time. Can I read
the page line by line ?
________________________________
From: Arup Rakshit <lists@ruby-forum.com>
To: ruby-talk@ruby-lang.org
Sent: Wednesday, March 5, 2014 2:01 PM
Subject: Re: PDF reader gems
Wayne Brisette wrote in post #1138927:
I'm not sure I fully understand... You want to read all the data out of
a PDF (or just selected data?) and put all that data into a CSV file?
Here's a sample that may be easy for you to understand. This simply goes
through a PDF and searches for a preset word or phrase and lists the
page on which it was found in the PDF.
I wrote the code below :
require 'pdf/reader'
File.open("#{__dir__}/a.pdf",'rb') do |io|
reader = PDF::Reader.new(io)
reader.pages.each do |page|
puts page.text
end
end
It is working. But `text` gives whole page content at a time. Can I read
the page line by line ?