Hi there,
I'm working on a project to automate retrieval of content and the
download of a pdf bill from an ISPs website. I have 30 connections with
this ISP and each has its own username and password. So far I have been
able to get the content I need that is actually stored within the page.
You have to login and go to a specific page to be able to click on the
link.
The link itself isn't the pdf as its generated on the fly:
source code of the link:
<a href="/be-portal/downloadPdf"
onclick="cmCreatePageElementTag('Download PDF', 'Member Centre');"
class="btn_Link">Download PDF</a>
I'm selecting it with:
link = invoice_page.links_with(:href => "/be-portal/downloadPdf")
how can I click this link to download the pdf and store it to the
filesystem?
btw, I only started with Ruby and Mechanize less than 24 hours ago.
TIA
Regards,
Dan
File.open('myfile', 'w+') do |file|
file << agent.get_file(link['href'])
end
where agent is the mechanize agent you used to log in and get the link.
···
--
Andrea Dallera
Il 12/09/2010 22:53, Dan Mansfield ha scritto:
Hi there,
I'm working on a project to automate retrieval of content and the
download of a pdf bill from an ISPs website. I have 30 connections with
this ISP and each has its own username and password. So far I have been
able to get the content I need that is actually stored within the page.
You have to login and go to a specific page to be able to click on the
link.
The link itself isn't the pdf as its generated on the fly:
source code of the link:
<a href="/be-portal/downloadPdf"
onclick="cmCreatePageElementTag('Download PDF', 'Member Centre');"
class="btn_Link">Download PDF</a>
I'm selecting it with:
link = invoice_page.links_with(:href => "/be-portal/downloadPdf")
how can I click this link to download the pdf and store it to the
filesystem?
btw, I only started with Ruby and Mechanize less than 24 hours ago.
TIA
Regards,
Dan
thanks, so my script is now:
....
invoice_page = agent.get(URL_BILLING)
link = invoice_page.links_with(:href => "/be-portal/downloadPdf")
File.open('myfile', 'w+') do |file|
file << agent.get_file(link['href'])
end
results in:
C:/ruby/test.rb:27:in `[]': can't convert String into Integer
(TypeError)
from C:/ruby/test.rb:27:in `block in <main>'
from C:/ruby/test.rb:26:in `open'
from C:/ruby/test.rb:26:in `<main>'
what is the proper method to see the response back from clicking the
link?
links_with returns an array. Try using .first to pick out the first result,
so:
link = invoice_page.links_with(:href => "/be-portal/downloadPdf").first
···
On Sun, Sep 12, 2010 at 5:54 PM, Dan Mansfield <dan@bleckfield.com> wrote:
thanks, so my script is now:
....
invoice_page = agent.get(URL_BILLING)
link = invoice_page.links_with(:href => "/be-portal/downloadPdf")
File.open('myfile', 'w+') do |file|
file << agent.get_file(link['href'])
end
results in:
C:/ruby/test.rb:27:in `': can't convert String into Integer
(TypeError)
from C:/ruby/test.rb:27:in `block in <main>'
from C:/ruby/test.rb:26:in `open'
from C:/ruby/test.rb:26:in `<main>'
what is the proper method to see the response back from clicking the
link?
C:/ruby/test.rb:27:in `': can't convert String into Integer
(TypeError)
from C:/ruby/test.rb:27:in `block in <main>'
from C:/ruby/test.rb:26:in `open'
from C:/ruby/test.rb:26:in `<main>'
what is the proper method to see the response back from clicking the
link?
links_with returns an array. Try using .first to pick out the first
result,
so:
link = invoice_page.links_with(:href => "/be-portal/downloadPdf").first
Ok, so some progress:
invoice_page = agent.get(URL_BILLING)
link = invoice_page.links_with(:href => "/be-portal/downloadPdf").first
page = agent.click(link) #pp page
File.open('myfile.pdf', 'w+') do |file|
file << page
end
If I look at the content of page now it contains the stream of PDF data
as well as:
@code="200",
@filename="Invoice_14051844_08/09/2010.pdf",
@response=
{"date"=>"Wed, 15 Sep 2010 19:13:04 GMT",
"server"=>"Apache",
"expires"=>"Wed, 15 Sep 2010 19:14:04 GMT",
"cache-control"=>"max-age=60",
"content-disposition"=>
"attachment;filename=Invoice_14051844_08/09/2010.pdf",
"vary"=>"Accept-Encoding",
"content-encoding"=>"gzip",
"content-length"=>"7950",
"keep-alive"=>"timeout=5, max=92",
"connection"=>"Keep-Alive",
"content-type"=>"application/octet-stream"},
@uri=
#<URI::HTTPS:0x335bc38
URL:https://www.bethere.co.uk/be-portal/downloadPdf>>
···
On Sun, Sep 12, 2010 at 5:54 PM, Dan Mansfield <dan@bleckfield.com> > wrote:
I tried this too:
File.open('myfile.pdf', 'w+') do |file|
file << page.body
end
Which almost works but presents a corrupt pdf file. I can see the
document properties of the pdf file but there is no content.
SO CLOSE!
By doing a binary compare of a working version downloaded through the
browser and the one through Mechanize, I have found that it is saving
the line breaks as 0D 0A in hex versus just 0A in the working file.
Whilst I dig around to find how to avoid Mechanise/Ruby using that
behaviour. Has anyone else come across this and have a solution?
Thanks