I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?
The following is the interesting part of the page:
<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;"> </a>
</td>
···
--
Posted via http://www.ruby-forum.com/.
You might check out Harmony:
http://www.rubyinside.com/harmony-javascript-and-a-dom-environment-in-ruby-3001.html
http://rubygems.org/gems/harmony
···
On Thu, May 20, 2010 at 12:48 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com > wrote:
I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?
The following is the interesting part of the page:
<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;"> </a>
</td>
--
Posted via http://www.ruby-forum.com/\.
I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?
http://celerity.rubyforge.org/
The following is the interesting part of the page:
<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;"> </a>
</td>
The *really* interesting part is what does the Javascript do
with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.
···
On Thu, May 20, 2010 at 1:48 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com> wrote:
The other trick here is that this page is behind a login. Mechanize
allows me to fill out the login form and holds onto the login
credentials for me. Can harmony/celebrity/watir do this?
The *really* interesting part is what does the Javascript do
with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.
How would one do this? I'm somewhat new to javascript as I usually
don't do front end engineering. I see the below definition of this
function in the HTML page. Any way I can sniff out what it's actually
doing? I'm looking to figure out what the fireClick method displays.
<script type="text/javascript">
var d = document.domain.split(".");
document.domain = d[d.length - 2] + "." + d[d.length - 1];
var start = (new Date()).getTime();
var fireClick = function(){};
var omn_hierarchy="US|AMEX|Ser|eStatement";
var omn_pagename="MainPage";
var omn_language="en";
var omn_newpagename="yes";
</script>
... way down below...
<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;"> </a>
</td>
···
--
Posted via http://www.ruby-forum.com/\.
The other trick here is that this page is behind a login. Mechanize
allows me to fill out the login form and holds onto the login
credentials for me. Can harmony/celebrity/watir do this?
Watir definitely does that since it simply controls your browser and
therefore behaves exactly like one.
···
On Fri, May 21, 2010 at 1:14 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com> wrote:
The *really* interesting part is what does the Javascript do
with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.
How would one do this? I'm somewhat new to javascript as I usually
don't do front end engineering. I see the below definition of this
function in the HTML page. Any way I can sniff out what it's actually
doing? I'm looking to figure out what the fireClick method displays.
<script type="text/javascript">
var d = document.domain.split(".");
document.domain = d[d.length - 2] + "." + d[d.length - 1];
var start = (new Date()).getTime();
var fireClick = function(){};
var omn_hierarchy="US|AMEX|Ser|eStatement";
var omn_pagename="MainPage";
var omn_language="en";
var omn_newpagename="yes";
</script>
... way down below...
<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;"> </a>
</td>
--
Posted via http://www.ruby-forum.com/\.
--
Michael Fellinger
CTO, The Rubyists, LLC
972-996-5199
Mechanize cannot execute javascript but watir/celerity can. (I've never
used harmony)
#in watir (could also use firewatir and/or the safari equivalent)
require 'watir'
require 'watir/ie'
# should work identically with celerity
#require 'celerity
#@browser = Celerity::IE.new
@login_page = 'http://example.com/'
@browser = Watir::IE.new
@browser.goto @login_page
@browser.text_field(:name, 'username').set(@user)
@browser.text_field(:name, 'password').set(@pass)
@browser.button(:value, "LogIn").click
# go to page where the javascript link is
@broswer.link(:text, "Link Name").click
# click it
# this assumes the fireClick event is 'just' an ajax call which returns
content
@broswer.link(:id, "iroc_0").click
@browser.wait # wait for ajax to return
# show page's displaying text (not view source)
puts @browser.text
# if above fires a pop up window more code is needed to retrieve the
content
···
--
Posted via http://www.ruby-forum.com/.
Mechanize cannot execute javascript but watir/celerity can. (I've never
used harmony)
Harmony uses envjs to execute JavaScript. There's also capybara which can either use a browser or envjs.
This is extremely helpful!
With Watir I'm running into a problem finding the image button for login
on the following page:
https://online.americanexpress.com/myca/logon/us/action?request_type=LogonHandler&Face=en_US&DestPage=https%3A%2F%2Fwww99.americanexpress.com%2Fmyca%2Facctsumm%2Fus%2Faction%3Frequest_type%3Dauthreg_acctAccountSummary%26us_nu%3Dlogincontrol
It looks like the login button is just a clickable image and I should be
able to find it via:
browser.button(:alt, "Login").click
Any idea why that doesn't find the button?
David Wright wrote:
···
Mechanize cannot execute javascript but watir/celerity can. (I've never
used harmony)
#in watir (could also use firewatir and/or the safari equivalent)
require 'watir'
require 'watir/ie'
# should work identically with celerity
#require 'celerity
#@browser = Celerity::IE.new
@login_page = 'http://example.com/'
@browser = Watir::IE.new
@browser.goto @login_page
@browser.text_field(:name, 'username').set(@user)
@browser.text_field(:name, 'password').set(@pass)
@browser.button(:value, "LogIn").click
# go to page where the javascript link is
@broswer.link(:text, "Link Name").click
# click it
# this assumes the fireClick event is 'just' an ajax call which returns
content
@broswer.link(:id, "iroc_0").click
@browser.wait # wait for ajax to return
# show page's displaying text (not view source)
puts @browser.text
# if above fires a pop up window more code is needed to retrieve the
content
--
Posted via http://www.ruby-forum.com/\.
Sorry, don't have time to look at the page right now, but if it "is
just a clickable image" and not an actual "button" watir's button
helper may not find it (even though it looks like a button) so try
browser.image().click?
···
On Mon, May 24, 2010 at 3:36 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com> wrote:
With Watir I'm running into a problem finding the image button for login
on the following page:
One App
It looks like the login button is just a clickable image and I should be
able to find it via:
browser.button(:alt, "Login").click
Any idea why that doesn't find the button?
To click on this with Watir:
You can use:
@browser.button(:src, 'https://online.americanexpress.com/myca/logon/
us/shared/images/btn_login.gif').click
This was captured using the Webmetrics script recorder
http://www.webmetrics.com/products/script_recorder.html
It has a Watir compatible mode. You won't get a working
script out of it but it good for identifying objects.
Inspect Element using FireBug:
<input type="image" border="0" onclick="javascript:loginNow();return
false;" tabindex="5" style="margin-top: 5px; margin-left: 20px; margin-
bottom: 22px;" alt="Login" src="/myca/logon/us/shared/images/
btn_login.gif">
A nice helper tool for identify page object such as this Webmetrics
Good luck,
Darryl
···
On May 24, 10:32 am, brab...@gmail.com wrote:
On Mon, May 24, 2010 at 3:36 AM, Phil Mcdonnell > > <phil.a.mcdonn...@gmail.com> wrote:
> With Watir I'm running into a problem finding the image button for login
> on the following page:
>https://online.americanexpress.com/myca/logon/us/action?request_type=\.\.\.
> It looks like the login button is just a clickable image and I should be
> able to find it via:
> browser.button(:alt, "Login").click
> Any idea why that doesn't find the button?
Sorry, don't have time to look at the page right now, but if it "is
just a clickable image" and not an actual "button" watir's button
helper may not find it (even though it looks like a button) so try
browser.image().click?
Darryl! You just made my day! This does work. I've been banging my
head on the wall for a while here
I had tried looking for the src
tag too, but not with the full path (only the referential path in the
html).
Thank you!
Darryl Brown wrote:
···
To click on this with Watir:
You can use:
@browser.button(:src, 'https://online.americanexpress.com/myca/logon/
us/shared/images/btn_login.gif').click
This was captured using the Webmetrics script recorder
http://www.webmetrics.com/products/script_recorder.html
It has a Watir compatible mode. You won't get a working
script out of it but it good for identifying objects.
Inspect Element using FireBug:
<input type="image" border="0" onclick="javascript:loginNow();return
false;" tabindex="5" style="margin-top: 5px; margin-left: 20px; margin-
bottom: 22px;" alt="Login" src="/myca/logon/us/shared/images/
btn_login.gif">
A nice helper tool for identify page object such as this Webmetrics
Good luck,
Darryl
--
Posted via http://www.ruby-forum.com/\.