Scrape javascript content

I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?

The following is the interesting part of the page:

<td class="colPlus" onclick="fireClick(this,0)">
    <a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>

···

--
Posted via http://www.ruby-forum.com/.

You might check out Harmony:

http://www.rubyinside.com/harmony-javascript-and-a-dom-environment-in-ruby-3001.html
http://rubygems.org/gems/harmony

···

On Thu, May 20, 2010 at 12:48 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com > wrote:

I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?

The following is the interesting part of the page:

<td class="colPlus" onclick="fireClick(this,0)">
   <a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>
--
Posted via http://www.ruby-forum.com/\.

I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?

http://celerity.rubyforge.org/

The following is the interesting part of the page:

<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>

The *really* interesting part is what does the Javascript do :slight_smile: with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.

···

On Thu, May 20, 2010 at 1:48 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com> wrote:

The other trick here is that this page is behind a login. Mechanize
allows me to fill out the login form and holds onto the login
credentials for me. Can harmony/celebrity/watir do this?

The *really* interesting part is what does the Javascript do :slight_smile: with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.

How would one do this? I'm somewhat new to javascript as I usually
don't do front end engineering. I see the below definition of this
function in the HTML page. Any way I can sniff out what it's actually
doing? I'm looking to figure out what the fireClick method displays.

    <script type="text/javascript">
      var d = document.domain.split(".");
      document.domain = d[d.length - 2] + "." + d[d.length - 1];
      var start = (new Date()).getTime();
      var fireClick = function(){};
      var omn_hierarchy="US|AMEX|Ser|eStatement";
      var omn_pagename="MainPage";
      var omn_language="en";
      var omn_newpagename="yes";
    </script>

... way down below...

<td class="colPlus" onclick="fireClick(this,0)">
                    <a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>

···

--
Posted via http://www.ruby-forum.com/\.

The other trick here is that this page is behind a login. Mechanize
allows me to fill out the login form and holds onto the login
credentials for me. Can harmony/celebrity/watir do this?

Watir definitely does that since it simply controls your browser and
therefore behaves exactly like one.

···

On Fri, May 21, 2010 at 1:14 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com> wrote:

The *really* interesting part is what does the Javascript do :slight_smile: with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.

How would one do this? I'm somewhat new to javascript as I usually
don't do front end engineering. I see the below definition of this
function in the HTML page. Any way I can sniff out what it's actually
doing? I'm looking to figure out what the fireClick method displays.

<script type="text/javascript">
var d = document.domain.split(".");
document.domain = d[d.length - 2] + "." + d[d.length - 1];
var start = (new Date()).getTime();
var fireClick = function(){};
var omn_hierarchy="US|AMEX|Ser|eStatement";
var omn_pagename="MainPage";
var omn_language="en";
var omn_newpagename="yes";
</script>

... way down below...

<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>
--
Posted via http://www.ruby-forum.com/\.

--
Michael Fellinger
CTO, The Rubyists, LLC
972-996-5199

Mechanize cannot execute javascript but watir/celerity can. (I've never
used harmony)

#in watir (could also use firewatir and/or the safari equivalent)
require 'watir'
require 'watir/ie'

# should work identically with celerity
#require 'celerity
#@browser = Celerity::IE.new

@login_page = 'http://example.com/'

@browser = Watir::IE.new
@browser.goto @login_page
@browser.text_field(:name, 'username').set(@user)
@browser.text_field(:name, 'password').set(@pass)
@browser.button(:value, "LogIn").click

# go to page where the javascript link is
@broswer.link(:text, "Link Name").click

# click it
# this assumes the fireClick event is 'just' an ajax call which returns
content
@broswer.link(:id, "iroc_0").click
@browser.wait # wait for ajax to return

# show page's displaying text (not view source)
puts @browser.text
# if above fires a pop up window more code is needed to retrieve the
content

···

--
Posted via http://www.ruby-forum.com/.

Mechanize cannot execute javascript but watir/celerity can. (I've never
used harmony)

Harmony uses envjs to execute JavaScript. There's also capybara which can either use a browser or envjs.

This is extremely helpful!

With Watir I'm running into a problem finding the image button for login
on the following page:
https://online.americanexpress.com/myca/logon/us/action?request_type=LogonHandler&Face=en_US&DestPage=https%3A%2F%2Fwww99.americanexpress.com%2Fmyca%2Facctsumm%2Fus%2Faction%3Frequest_type%3Dauthreg_acctAccountSummary%26us_nu%3Dlogincontrol

It looks like the login button is just a clickable image and I should be
able to find it via:
browser.button(:alt, "Login").click

Any idea why that doesn't find the button?

David Wright wrote:

···

Mechanize cannot execute javascript but watir/celerity can. (I've never
used harmony)

#in watir (could also use firewatir and/or the safari equivalent)
require 'watir'
require 'watir/ie'

# should work identically with celerity
#require 'celerity
#@browser = Celerity::IE.new

@login_page = 'http://example.com/&#39;

@browser = Watir::IE.new
@browser.goto @login_page
@browser.text_field(:name, 'username').set(@user)
@browser.text_field(:name, 'password').set(@pass)
@browser.button(:value, "LogIn").click

# go to page where the javascript link is
@broswer.link(:text, "Link Name").click

# click it
# this assumes the fireClick event is 'just' an ajax call which returns
content
@broswer.link(:id, "iroc_0").click
@browser.wait # wait for ajax to return

# show page's displaying text (not view source)
puts @browser.text
# if above fires a pop up window more code is needed to retrieve the
content

--
Posted via http://www.ruby-forum.com/\.

Sorry, don't have time to look at the page right now, but if it "is
just a clickable image" and not an actual "button" watir's button
helper may not find it (even though it looks like a button) so try
browser.image().click?

···

On Mon, May 24, 2010 at 3:36 AM, Phil Mcdonnell <phil.a.mcdonnell@gmail.com> wrote:

With Watir I'm running into a problem finding the image button for login
on the following page:
One App

It looks like the login button is just a clickable image and I should be
able to find it via:
browser.button(:alt, "Login").click

Any idea why that doesn't find the button?

To click on this with Watir:
You can use:

  @browser.button(:src, 'https://online.americanexpress.com/myca/logon/
us/shared/images/btn_login.gif').click

This was captured using the Webmetrics script recorder
http://www.webmetrics.com/products/script_recorder.html
It has a Watir compatible mode. You won't get a working
script out of it but it good for identifying objects.

Inspect Element using FireBug:

<input type="image" border="0" onclick="javascript:loginNow();return
false;" tabindex="5" style="margin-top: 5px; margin-left: 20px; margin-
bottom: 22px;" alt="Login" src="/myca/logon/us/shared/images/
btn_login.gif">

A nice helper tool for identify page object such as this Webmetrics

Good luck,
Darryl

···

On May 24, 10:32 am, brab...@gmail.com wrote:

On Mon, May 24, 2010 at 3:36 AM, Phil Mcdonnell > > <phil.a.mcdonn...@gmail.com> wrote:
> With Watir I'm running into a problem finding the image button for login
> on the following page:
>https://online.americanexpress.com/myca/logon/us/action?request_type=\.\.\.

> It looks like the login button is just a clickable image and I should be
> able to find it via:
> browser.button(:alt, "Login").click

> Any idea why that doesn't find the button?

Sorry, don't have time to look at the page right now, but if it "is
just a clickable image" and not an actual "button" watir's button
helper may not find it (even though it looks like a button) so try
browser.image().click?

Darryl! You just made my day! This does work. I've been banging my
head on the wall for a while here :slight_smile: I had tried looking for the src
tag too, but not with the full path (only the referential path in the
html).

Thank you!

Darryl Brown wrote:

···

To click on this with Watir:
You can use:

  @browser.button(:src, 'https://online.americanexpress.com/myca/logon/
us/shared/images/btn_login.gif').click

This was captured using the Webmetrics script recorder
http://www.webmetrics.com/products/script_recorder.html
It has a Watir compatible mode. You won't get a working
script out of it but it good for identifying objects.

Inspect Element using FireBug:

<input type="image" border="0" onclick="javascript:loginNow();return
false;" tabindex="5" style="margin-top: 5px; margin-left: 20px; margin-
bottom: 22px;" alt="Login" src="/myca/logon/us/shared/images/
btn_login.gif">

A nice helper tool for identify page object such as this Webmetrics

Good luck,
Darryl

--
Posted via http://www.ruby-forum.com/\.