Help re recording/replaying (i.e. automating) HTTP interactions to a web-site?

Greg_Hauptmann · 26 April 2008 08:47

Hi,

Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?

The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I'm thinking someone probably has a better way, or
plugin, to handle at least part (b)?

Tks

Peter_Szinek3 · 26 April 2008 09:11

Greg,

Have you looked into (Fire)Watir?

I am just releasing a new version of scRUBYt! (a web scraping framework) where it will be possible to use FireWatir as the agent for navigation/scraping, so you can write a simple but powerful DSL (stuff like 'click_link', 'fill_textfield' etc) which is executed through Firefox and is very well suited for scenarios you just described. Drop me a line if you are interested.

But of course plain (Fire)Watir would do, too.

Cheers,
Peter

···

___
http://www.rubyrailways.com
http://scrubyt.org

On Apr 26, 2008, at 10:47 AM, Greg Hauptmann wrote:

Hi,

Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?

The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I'm thinking someone probably has a better way, or
plugin, to handle at least part (b)?

Tks

7stud · 27 April 2008 06:17

Greg Hauptmann wrote:

Hi,

Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?

The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I'm thinking someone probably has a better way, or
plugin, to handle at least part (b)?

I'm not sure what the Firefox Live HTTP headers plugin will do for you.
If you write a ruby program to send out requests to a url, then you know
what headers you are sending in your request, and when you get the
response, you can read the headers in the response.

···

--
Posted via http://www.ruby-forum.com/\.

Greg_Hauptmann · 26 April 2008 21:18

sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)

thanks

···

On 4/26/08, Peter Szinek <peter@rubyrailways.com> wrote:

Greg,

Have you looked into (Fire)Watir?

I am just releasing a new version of scRUBYt! (a web scraping
framework) where it will be possible to use FireWatir as the agent for
navigation/scraping, so you can write a simple but powerful DSL (stuff
like 'click_link', 'fill_textfield' etc) which is executed through
Firefox and is very well suited for scenarios you just described. Drop
me a line if you are interested.

But of course plain (Fire)Watir would do, too.

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

On Apr 26, 2008, at 10:47 AM, Greg Hauptmann wrote:

> Hi,
>
> Actually can anyone recommend a good technique / software / plugin
> that would assist if I wanted to effectively (a) record my interaction
> with my bank at the HTTP level, then (b) use this to automate that
> behavior in my RoR application to automate pulling down daily account
> details?
>
> The best I can think of at the moment is: (a) Firefox Live HTTP
> Headers plugin then (b) manually write Ruby code that sends these out
> and waits for the response & check it before proceeding to the next
> http request. I'm thinking someone probably has a better way, or
> plugin, to handle at least part (b)?
>
> Tks
>

Greg_Hauptmann · 26 April 2008 21:28

PS. 4th question Peter I forgot:
• does it support downloading a file (eg csv file, account transactions)

···

On 4/27/08, Greg Hauptmann <greg.hauptmann.ruby@gmail.com> wrote:

sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)

thanks

On 4/26/08, Peter Szinek <peter@rubyrailways.com> wrote:
> Greg,
>
> Have you looked into (Fire)Watir?
>
> I am just releasing a new version of scRUBYt! (a web scraping
> framework) where it will be possible to use FireWatir as the agent for
> navigation/scraping, so you can write a simple but powerful DSL (stuff
> like 'click_link', 'fill_textfield' etc) which is executed through
> Firefox and is very well suited for scenarios you just described. Drop
> me a line if you are interested.
>
> But of course plain (Fire)Watir would do, too.
>
> Cheers,
> Peter
> ___
> http://www.rubyrailways.com
> http://scrubyt.org
>
> On Apr 26, 2008, at 10:47 AM, Greg Hauptmann wrote:
>
> > Hi,
> >
> > Actually can anyone recommend a good technique / software / plugin
> > that would assist if I wanted to effectively (a) record my interaction
> > with my bank at the HTTP level, then (b) use this to automate that
> > behavior in my RoR application to automate pulling down daily account
> > details?
> >
> > The best I can think of at the moment is: (a) Firefox Live HTTP
> > Headers plugin then (b) manually write Ruby code that sends these out
> > and waits for the response & check it before proceeding to the next
> > http request. I'm thinking someone probably has a better way, or
> > plugin, to handle at least part (b)?
> >
> > Tks
> >
>
>

Phil · 26 April 2008 21:33

Greg Hauptmann wrote:

PS. 4th question Peter I forgot:
• does it support downloading a file (eg csv file, account transactions)

http://wtr.rubyforge.org/

Find it out?

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

A born loser:
~ Somebody who calls the number that's scrawled in lipstick on the phone
~ booth wall-- and his wife answers.

Peter_Szinek3 · 28 April 2008 06:32

PS. 4th question Peter I forgot:
• does it support downloading a file (eg csv file, account transactions)

sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)

Yes, scRUBYt! supports all these things... In the current implementation WWW::Mechanize is used as the agent, but it doesn't support JavaScript and (more often than not) e-banking sites have some JS... so that's why I suggested the FireWatir based solution.

A browser-agnostic solution doesn't exist (Mechanize is a browser too) - the nature of the task requires a browser. Call it as you like, but if something is able to GET/POST requests, store cookies, use https, sessions, .... then it is a browser in my vocabulary.

Besides FireWatir is platform-independent (unlike Watir which is win32 only).

Cheers,
Peter

···

On 4/27/08, Greg Hauptmann <greg.hauptmann.ruby@gmail.com> wrote:

Greg_Hauptmann · 27 April 2008 02:45

I see Watar requircs/drives a browser...i'm after something browser
independent...any other library/plugin suggestions?

···

On 4/27/08, Phillip Gawlowski <cmdjackryan@googlemail.com> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Greg Hauptmann wrote:
> PS. 4th question Peter I forgot:
> • does it support downloading a file (eg csv file, account transactions)

http://wtr.rubyforge.org/

Find it out?

- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

A born loser:
~ Somebody who calls the number that's scrawled in lipstick on the phone
~ booth wall-- and his wife answers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgTn5wACgkQbtAgaoJTgL+jvgCePwARmYTIE1hktGz6yVD0JeWk
rHMAnRt+JpgafQAJivHFyXvag8Tt2duT
=smt6
-----END PGP SIGNATURE-----

Greg_Hauptmann · 28 April 2008 10:50

thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt...

···

2008/4/28 Peter Szinek <peter@rubyrailways.com>:

PS. 4th question Peter I forgot:
> • does it support downloading a file (eg csv file, account transactions)
>
> On 4/27/08, Greg Hauptmann <greg.hauptmann.ruby@gmail.com> wrote:
>
> > sounds good, it support:
> > • https?
> > • cookies?
> > • building in some intelligence? (say when the link for step N will
> > change over time but you can write an algorithm for it)
> >
> >
Yes, scRUBYt! supports all these things... In the current implementation
WWW::Mechanize is used as the agent, but it doesn't support JavaScript and
(more often than not) e-banking sites have some JS... so that's why I
suggested the FireWatir based solution.

A browser-agnostic solution doesn't exist (Mechanize is a browser too) -
the nature of the task requires a browser. Call it as you like, but if
something is able to GET/POST requests, store cookies, use https, sessions,
.... then it is a browser in my vocabulary.

Besides FireWatir is platform-independent (unlike Watir which is win32
only).

Cheers,
Peter

Phil · 27 April 2008 03:20

Q: Why is top posting bad?

Greg Hauptmann wrote:

I see Watar requircs/drives a browser...i'm after something browser
independent...any other library/plugin suggestions?

WWW::Mechanize is quite popular, from what I've seen so far.
- --
Phillip Gawlowski
Twitter: twitter.com/cynicalryan

~ - You know you've been hacking too long when...
...you discover that you're balancing your checkbook in octal.

···

A: Because it makes it hard to follow the discussion.

Peter_Szinek3 · 28 April 2008 11:06

OK, cool. If you don't have JS/AJAX or other trick that Mechanize can't handle, you should be OK.

On the other hand, if you do have JS/AJAX on the page, you will need FireWatir, whether you like it or not the FireWatir-enabled version of scRUBYt! is not yet officially released - if you want to try it, you need to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem and install.

Let me know if you encounter any problems!

Cheers,
Peter

···

On Apr 28, 2008, at 12:50 PM, Greg Hauptmann wrote:

thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt...
2008/4/28 Peter Szinek <peter@rubyrailways.com>:

___
http://www.rubyrailways.com
http://scrubyt.org

Greg_Hauptmann · 28 April 2008 11:35

I must admit you're managing to overwhelm me slightly with the number of
libraries / packages here
So what does the stick look like you're talking about. Will it be fronted
by scrubyt then like:

- Scrubyt
   - FireWatir
   - Mechanize
      - hpricot

Is this correct? If it were simply could you put in brackets the key thing
each layer does/focuses on?

Cheers
Greg

···

On Mon, Apr 28, 2008 at 9:06 PM, Peter Szinek <peter@rubyrailways.com> wrote:

On Apr 28, 2008, at 12:50 PM, Greg Hauptmann wrote:

thanks Peter - I was starting to look at Mechanize but will focus in
> at scRUBYt...
> 2008/4/28 Peter Szinek <peter@rubyrailways.com>:
>

OK, cool. If you don't have JS/AJAX or other trick that Mechanize can't
handle, you should be OK.

On the other hand, if you do have JS/AJAX on the page, you will need
FireWatir, whether you like it or not the FireWatir-enabled version of
scRUBYt! is not yet officially released - if you want to try it, you need
to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem and install.

Let me know if you encounter any problems!

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Peter_Szinek3 · 28 April 2008 12:09

In the current official release (0.3.4), FireWatir is not yet added. So you have just Mechanize + Hpricot.

Mechanize does the navigational part - fill this textfield, then click that button, and if you arrived at the result page, crawl to all the detail pages etc.
Once you arrive at the final page from where you don't want to go on further, you start the actual scraping, and in this case that's done through Hpricot. You take the page where you arrived, parse it with Hpricot and collect the results from it.

In the development release, it is possible to plug-in other agents than Mechanize - theoretically anything, currently FireWatir is implemented. But if you want to use Mechanize as the agent for crawling, you don't need to install FireWatir at all.

FireWatir based scraping has other benefits beyond JS/AJAX - for example more robust HTML parsing (which is done by Firefox in this case). Hpricot is a great parser but it can't beat Hpricot (yet).
Firefox-parsed HTML also means you can use XPaths straight from FireBug or DOM Inspector (which is not the case with Mechanize)

On the downside, Mechanize-based navigation/scraping is faster (you don't have to wait until the page renders, which is a prerequisite for FireWater-based navigation etc.)

Does this answer your question? (If not, be sure to keep asking

Cheers,
Peter

···

___
http://www.rubyrailways.com
http://scrubyt.org

On Apr 28, 2008, at 1:35 PM, Greg Hauptmann wrote:

I must admit you're managing to overwhelm me slightly with the number of
libraries / packages here
So what does the stick look like you're talking about. Will it be fronted
by scrubyt then like:

- Scrubyt
  - FireWatir
  - Mechanize
     - hpricot

Is this correct? If it were simply could you put in brackets the key thing
each layer does/focuses on?

Cheers
Greg

On Mon, Apr 28, 2008 at 9:06 PM, Peter Szinek <peter@rubyrailways.com> > wrote:

On Apr 28, 2008, at 12:50 PM, Greg Hauptmann wrote:

thanks Peter - I was starting to look at Mechanize but will focus in

at scRUBYt...
2008/4/28 Peter Szinek <peter@rubyrailways.com>:

OK, cool. If you don't have JS/AJAX or other trick that Mechanize can't
handle, you should be OK.

On the other hand, if you do have JS/AJAX on the page, you will need
FireWatir, whether you like it or not the FireWatir-enabled version of
scRUBYt! is not yet officially released - if you want to try it, you need
to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem and install.

Let me know if you encounter any problems!

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Topic		Replies	Views
Best way to automate web browser tasks? ruby-talk	13	187	26 October 2006
Automated testing tool in ruby ruby-talk	0	126	30 August 2005
(Fire\|)Watir and Selenium - Google SoC ruby-talk	5	102	10 April 2007
[ANN] scRUBYt! 0.4.1 ruby-talk	1	141	6 January 2009
Plugin Automation FireWatir ruby-talk	0	81	17 June 2009

Help re recording/replaying (i.e. automating) HTTP interactions to a web-site?

Related topics