Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?
The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I'm thinking someone probably has a better way, or
plugin, to handle at least part (b)?
I am just releasing a new version of scRUBYt! (a web scraping framework) where it will be possible to use FireWatir as the agent for navigation/scraping, so you can write a simple but powerful DSL (stuff like 'click_link', 'fill_textfield' etc) which is executed through Firefox and is very well suited for scenarios you just described. Drop me a line if you are interested.
On Apr 26, 2008, at 10:47 AM, Greg Hauptmann wrote:
Hi,
Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?
The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I'm thinking someone probably has a better way, or
plugin, to handle at least part (b)?
Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?
The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I'm thinking someone probably has a better way, or
plugin, to handle at least part (b)?
I'm not sure what the Firefox Live HTTP headers plugin will do for you.
If you write a ruby program to send out requests to a url, then you know
what headers you are sending in your request, and when you get the
response, you can read the headers in the response.
sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)
thanks
···
On 4/26/08, Peter Szinek <peter@rubyrailways.com> wrote:
Greg,
Have you looked into (Fire)Watir?
I am just releasing a new version of scRUBYt! (a web scraping
framework) where it will be possible to use FireWatir as the agent for
navigation/scraping, so you can write a simple but powerful DSL (stuff
like 'click_link', 'fill_textfield' etc) which is executed through
Firefox and is very well suited for scenarios you just described. Drop
me a line if you are interested.
On Apr 26, 2008, at 10:47 AM, Greg Hauptmann wrote:
> Hi,
>
> Actually can anyone recommend a good technique / software / plugin
> that would assist if I wanted to effectively (a) record my interaction
> with my bank at the HTTP level, then (b) use this to automate that
> behavior in my RoR application to automate pulling down daily account
> details?
>
> The best I can think of at the moment is: (a) Firefox Live HTTP
> Headers plugin then (b) manually write Ruby code that sends these out
> and waits for the response & check it before proceeding to the next
> http request. I'm thinking someone probably has a better way, or
> plugin, to handle at least part (b)?
>
> Tks
>
PS. 4th question Peter I forgot:
• does it support downloading a file (eg csv file, account transactions)
···
On 4/27/08, Greg Hauptmann <greg.hauptmann.ruby@gmail.com> wrote:
sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)
thanks
On 4/26/08, Peter Szinek <peter@rubyrailways.com> wrote:
> Greg,
>
> Have you looked into (Fire)Watir?
>
> I am just releasing a new version of scRUBYt! (a web scraping
> framework) where it will be possible to use FireWatir as the agent for
> navigation/scraping, so you can write a simple but powerful DSL (stuff
> like 'click_link', 'fill_textfield' etc) which is executed through
> Firefox and is very well suited for scenarios you just described. Drop
> me a line if you are interested.
>
> But of course plain (Fire)Watir would do, too.
>
> Cheers,
> Peter
> ___
> http://www.rubyrailways.com
> http://scrubyt.org
>
> On Apr 26, 2008, at 10:47 AM, Greg Hauptmann wrote:
>
> > Hi,
> >
> > Actually can anyone recommend a good technique / software / plugin
> > that would assist if I wanted to effectively (a) record my interaction
> > with my bank at the HTTP level, then (b) use this to automate that
> > behavior in my RoR application to automate pulling down daily account
> > details?
> >
> > The best I can think of at the moment is: (a) Firefox Live HTTP
> > Headers plugin then (b) manually write Ruby code that sends these out
> > and waits for the response & check it before proceeding to the next
> > http request. I'm thinking someone probably has a better way, or
> > plugin, to handle at least part (b)?
> >
> > Tks
> >
>
>
PS. 4th question Peter I forgot:
• does it support downloading a file (eg csv file, account transactions)
sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)
Yes, scRUBYt! supports all these things... In the current implementation WWW::Mechanize is used as the agent, but it doesn't support JavaScript and (more often than not) e-banking sites have some JS... so that's why I suggested the FireWatir based solution.
A browser-agnostic solution doesn't exist (Mechanize is a browser too) - the nature of the task requires a browser. Call it as you like, but if something is able to GET/POST requests, store cookies, use https, sessions, .... then it is a browser in my vocabulary.
Besides FireWatir is platform-independent (unlike Watir which is win32 only).
Cheers,
Peter
···
On 4/27/08, Greg Hauptmann <greg.hauptmann.ruby@gmail.com> wrote:
A born loser:
~ Somebody who calls the number that's scrawled in lipstick on the phone
~ booth wall-- and his wife answers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt...
···
2008/4/28 Peter Szinek <peter@rubyrailways.com>:
PS. 4th question Peter I forgot:
> • does it support downloading a file (eg csv file, account transactions)
>
> On 4/27/08, Greg Hauptmann <greg.hauptmann.ruby@gmail.com> wrote:
>
> > sounds good, it support:
> > • https?
> > • cookies?
> > • building in some intelligence? (say when the link for step N will
> > change over time but you can write an algorithm for it)
> >
> >
Yes, scRUBYt! supports all these things... In the current implementation
WWW::Mechanize is used as the agent, but it doesn't support JavaScript and
(more often than not) e-banking sites have some JS... so that's why I
suggested the FireWatir based solution.
A browser-agnostic solution doesn't exist (Mechanize is a browser too) -
the nature of the task requires a browser. Call it as you like, but if
something is able to GET/POST requests, store cookies, use https, sessions,
.... then it is a browser in my vocabulary.
Besides FireWatir is platform-independent (unlike Watir which is win32
only).
OK, cool. If you don't have JS/AJAX or other trick that Mechanize can't handle, you should be OK.
On the other hand, if you do have JS/AJAX on the page, you will need FireWatir, whether you like it or not the FireWatir-enabled version of scRUBYt! is not yet officially released - if you want to try it, you need to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem and install.
Let me know if you encounter any problems!
Cheers,
Peter
···
On Apr 28, 2008, at 12:50 PM, Greg Hauptmann wrote:
thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt...
2008/4/28 Peter Szinek <peter@rubyrailways.com>:
I must admit you're managing to overwhelm me slightly with the number of
libraries / packages here
So what does the stick look like you're talking about. Will it be fronted
by scrubyt then like:
- Scrubyt
- FireWatir
- Mechanize
- hpricot
Is this correct? If it were simply could you put in brackets the key thing
each layer does/focuses on?
Cheers
Greg
···
On Mon, Apr 28, 2008 at 9:06 PM, Peter Szinek <peter@rubyrailways.com> wrote:
On Apr 28, 2008, at 12:50 PM, Greg Hauptmann wrote:
thanks Peter - I was starting to look at Mechanize but will focus in
> at scRUBYt...
> 2008/4/28 Peter Szinek <peter@rubyrailways.com>:
>
OK, cool. If you don't have JS/AJAX or other trick that Mechanize can't
handle, you should be OK.
On the other hand, if you do have JS/AJAX on the page, you will need
FireWatir, whether you like it or not the FireWatir-enabled version of
scRUBYt! is not yet officially released - if you want to try it, you need
to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem and install.
In the current official release (0.3.4), FireWatir is not yet added. So you have just Mechanize + Hpricot.
Mechanize does the navigational part - fill this textfield, then click that button, and if you arrived at the result page, crawl to all the detail pages etc.
Once you arrive at the final page from where you don't want to go on further, you start the actual scraping, and in this case that's done through Hpricot. You take the page where you arrived, parse it with Hpricot and collect the results from it.
In the development release, it is possible to plug-in other agents than Mechanize - theoretically anything, currently FireWatir is implemented. But if you want to use Mechanize as the agent for crawling, you don't need to install FireWatir at all.
FireWatir based scraping has other benefits beyond JS/AJAX - for example more robust HTML parsing (which is done by Firefox in this case). Hpricot is a great parser but it can't beat Hpricot (yet).
Firefox-parsed HTML also means you can use XPaths straight from FireBug or DOM Inspector (which is not the case with Mechanize)
On the downside, Mechanize-based navigation/scraping is faster (you don't have to wait until the page renders, which is a prerequisite for FireWater-based navigation etc.)
Does this answer your question? (If not, be sure to keep asking
On Apr 28, 2008, at 1:35 PM, Greg Hauptmann wrote:
I must admit you're managing to overwhelm me slightly with the number of
libraries / packages here
So what does the stick look like you're talking about. Will it be fronted
by scrubyt then like:
- Scrubyt
- FireWatir
- Mechanize
- hpricot
Is this correct? If it were simply could you put in brackets the key thing
each layer does/focuses on?
Cheers
Greg
On Mon, Apr 28, 2008 at 9:06 PM, Peter Szinek <peter@rubyrailways.com> > wrote:
On Apr 28, 2008, at 12:50 PM, Greg Hauptmann wrote:
thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt...
2008/4/28 Peter Szinek <peter@rubyrailways.com>:
OK, cool. If you don't have JS/AJAX or other trick that Mechanize can't
handle, you should be OK.
On the other hand, if you do have JS/AJAX on the page, you will need
FireWatir, whether you like it or not the FireWatir-enabled version of
scRUBYt! is not yet officially released - if you want to try it, you need
to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem and install.