Logging to a page and scrapping values

Vikash_Kumar · 12 January 2007 10:31

I am running a test case, in which I have to first login to a web page
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.

Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.

I can use the below code to extract the data if I have not to login to
the web site.

require 'net/http'

# read the page data

http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )

# BEGIN processing HTML

def parse_html(data,tag)
return data.scan(%r{<#{tag}\s*.*?>(.*?)</#{tag}>}im).flatten
end

output = []
table_data = parse_html(page,"table")
table_data.each do |table|
 out_row = []
 row_data = parse_html(table,"tr")
 row_data.each do |row|
 cell_data = parse_html(row,"td")
 cell_data.each do |cell|
 cell.gsub!(%r{<.*?>},"")
 end
 out_row << cell_data
 end
 output << out_row
end

# END processing HTML

# examine the result

def parse_nested_array(array,tab = 0)
   n = 0
   array.each do |item|
      if(item.size > 0)
         puts "#{"\t" * tab}[#{n}] {"
         if(item.class == Array)
            parse_nested_array(item,tab+1)
         else
            puts "#{"\t" * (tab+1)}#{item}"
         end
         puts "#{"\t" * tab}}"
      end
      n += 1
   end
end

parse_nested_array(output[2][4])

aa, ab, ac, ad = output[2][4]

puts"hello"
puts aa + "\t" + ab + "\t" + ac + "\t" + ad

···

--
Posted via http://www.ruby-forum.com/.

Peter_Szinek3 · 12 January 2007 17:48

Vikash Kumar wrote:

I am running a test case, in which I have to first login to a web page
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.

Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.

I can use the below code to extract the data if I have not to login to
the web site.

In 2 days I am going to release a web extraction toolkit which will do exactly what you want (and more of course, but this is a basic use case)... It's based on Mechanize (which is used for login like stuff) and HPricot for extracting the relevant stuff. The scenario you described is an absolutely typical one, so you could try it with my stuff...

I will post here an announcement after the release.

Cheers,
Peter

···

__
http://www.rubyrailways.com

Vikash_Kumar · 13 January 2007 03:29

require 'net/http'

# read the page data

http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )

# BEGIN processing HTML

The code given above can be used to extract values from a web page, I we
don't have to login to a web page, we know in advance which URL to look
for to get data from it, but the problem is to first login to a page,
then go to some desired location to scrap values from it.

Please help me out in doing this.
Thanks in advance
Vikash

···

--
Posted via http://www.ruby-forum.com/\.

lrlebron · 13 January 2007 03:35

If you are running on a windows platform that you should look at watir.
It will let you control Internet Explorer and log in to a site.

Luis

Vikash Kumar wrote:

···

> require 'net/http'
>
> # read the page data
>
> http = Net::HTTP.new('kvcrpf.org, 80)
> resp, page = http.get('/achievements.htm', nil )
>
> # BEGIN processing HTML
>

The code given above can be used to extract values from a web page, I we
don't have to login to a web page, we know in advance which URL to look
for to get data from it, but the problem is to first login to a page,
then go to some desired location to scrap values from it.

Please help me out in doing this.
Thanks in advance
Vikash

--
Posted via http://www.ruby-forum.com/\.

Bermejo_Rodrigo · 14 January 2007 16:55

Vikash Kumar wrote:

require 'net/http'

# read the page data

http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )

# BEGIN processing HTML

The code given above can be used to extract values from a web page, I we
don't have to login to a web page, we know in advance which URL to look
for to get data from it, but the problem is to first login to a page,
then go to some desired location to scrap values from it.

Please help me out in doing this.
Thanks in advance
Vikash

There are a few ways of doing this , if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML

http://wtr.rubyforge.org/

$rm rm
.rb

···

--
Posted via http://www.ruby-forum.com/\.

K.P.Krishnamoorthy · 14 January 2007 17:47

Hi,

I've been successfully using Selenium (check out openqa.org) to do similar
(and more complex) web page interactions, querying etc. on both Linux and
Windows, using Ruby to drive things. If written thoughtfully, it's very easy
to get code that runs on both platforms without any code-changes required to
migrate between the two.

You get a nice '@selenium' object which has a large set of methods you can
use.

Apologies in advance if you already knew about this.

Kp.

···

On 1/14/07, Rodrigo Bermejo <rodrigo.bermejo@ps.ge.com> wrote:

Vikash Kumar wrote:
>> require 'net/http'
>>
>> # read the page data
>>
>> http = Net::HTTP.new('kvcrpf.org, 80)
>> resp, page = http.get('/achievements.htm', nil )
>>
>> # BEGIN processing HTML
>>
>
> The code given above can be used to extract values from a web page, I we
> don't have to login to a web page, we know in advance which URL to look
> for to get data from it, but the problem is to first login to a page,
> then go to some desired location to scrap values from it.
>
> Please help me out in doing this.
> Thanks in advance
> Vikash

There are a few ways of doing this , if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML

http://wtr.rubyforge.org/

$rm rm
.rb

--
Posted via http://www.ruby-forum.com/\.

--
"I refuse to prove that I exist," says God, "for proof denies faith, and
without faith I am nothing."
"But," says Man, "the Babel fish is a dead giveaway isn't it? It could not
have evolved by chance. It proves that you exist, and so therefore, by your
own arguments, you don't. Q.E.D."
"Oh dear," says God, "I hadn't thought of that," and promptly vanishes in a
puff of logic.
"Oh, that was easy," says Man, and for an encore goes on to prove that black
is white and gets himself killed on the next zebra crossing.

Vikash_Kumar · 15 January 2007 03:19

There are a few ways of doing this , if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML

http://wtr.rubyforge.org/

$rm rm
.rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash

···

--
Posted via http://www.ruby-forum.com/\.

Charles_L · 15 January 2007 04:43

Vikash Kumar wrote:

There are a few ways of doing this , if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML

http://wtr.rubyforge.org/

$rm rm
.rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash

Try a combination of WWW::Mechanize (gem install mechanize), and Hpricot
(gem install hpricot).

···

--
Posted via http://www.ruby-forum.com/\.

alex_f_il · 15 January 2007 16:55

You can also try SWExplorerAutomation SWEA from http://webiussoft.com.
SWEA is .Net API, but can be used from Ruby using RubyCLR

example:

require 'rubyclr'
RubyClr::reference 'System'
RubyClr::reference 'SWExplorerAutomationClient'
include SWExplorerAutomation::Client
include SWExplorerAutomation::Client::Controls
include SWExplorerAutomation::Client::DialogControls
explorerManager = ExplorerManager.new
explorerManager.Connect(-1)
explorerManager.LoadProject('google.htp')
explorerManager.Navigate('http://www.google.com/'\)
scene = explorerManager['Scene_0']
scene.WaitForActive(30000)
scene["q"].Value = 'c#'
scene['btnG'].Click()
scene = explorerManager['Scene_1']
scene.WaitForActive(30000)
explorerManager.DisconnectAndClose()

Vikash Kumar wrote:

···

> There are a few ways of doing this , if
> your are on windows watir[1] can help you out doing the login stuff, may
> the tricky part is how to get the data, but I am sure there is a method
> which allows you to extract the hole HTML
>
>
> http://wtr.rubyforge.org/
>
> $rm rm
> .rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash

--
Posted via http://www.ruby-forum.com/\.

Vikash_Kumar · 15 January 2007 06:23

Try a combination of WWW::Mechanize (gem install mechanize), and Hpricot
(gem install hpricot).

I am new to Mechanize and hpricot, though I have installed it, but I am
still facing the problem in scrapping values by first log in to the web
site then going to some other page to extract data from it.

Please help me.
Vikash

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Scraping html behind a log-in ruby-talk	1	129	17 May 2010
Scrapping data from a webpage where the data is loaded dynamically ruby-talk	7	166	8 February 2014
Scrape javascript content ruby-talk	10	178	27 May 2010
Automatic login and Screen scraping gem needed ruby-talk	4	181	1 March 2014
Ruby screen scraping ruby-talk	27	108	21 November 2006

Logging to a page and scrapping values

Related topics