Using hpricot to get tables

lrlebron · 1 July 2008 18:07

I am working with the following script to parse a page

require 'hpricot'
require 'open-uri'

strLink ="http://www.sportsline.com/mlb/gamecenter/boxscore/
MLB_20080331_ARI@CIN"
strPath ="div[@class=SLTables1]/div"

@doc = Hpricot(open(strLink))

@doc.search(strPath) do |div|
  puts div.inner_html
  # puts div.css_path
  # puts div.xpath
  # puts
  puts
end

This prints 4 tables to the screen

I would like to access each table individually. How can I do that?

Thanks,

Luis

Dan6 · 1 July 2008 21:03

I would like to access each table individually

doc.search returns an array even if there is only one match. The consturct you are using iterates through this array:

doc.search(strPath) do |div|

end

if you capture the search results into a variable named "divs" you can index it like and array (because it is one)

divs=doc.search(strPath)

If you want to immediately start iterating you can do this:

doc.search(strPath).each_with_index do |div,idiv|
puts idiv if idiv==2
end

I work with hpricot a lot and I find it is more productive to not use all the fancy ruby idioms to shorten your code as you are dealing with pages that are very fragile to parse when someone changes the page content.

See code below

···

==============
require 'hpricot'
require 'open-uri'

strLink ="http://www.sportsline.com/mlb/gamecenter/boxscore/MLB_20080331_ARI@CIN"
strPath ="//div[@class='SLTables1']/div"

doc = Hpricot(open(strLink))
divs=doc.search(strPath)

puts "#{divs[0].inner_text.slice(0..70)}\n\n"
puts "#{divs[1].inner_text.slice(0..70)}\n\n"
puts "#{divs[2].inner_text.slice(0..70)}\n\n"
puts "#{divs[3].inner_text.slice(0..70)}\n\n"

lrlebron · 1 July 2008 21:37

This works. Will be very useful for future projects.

I ended up using the xpath for each table which also worked.

Thanks,

Luis

···

On Jul 1, 4:03 pm, Dan Diebolt <dandieb...@yahoo.com> wrote:

[Note: parts of this message were removed to make it a legal post.]

>I would like to access each table individually

doc.search returns an array even if there is only one match. The consturct you are using iterates through this array:

doc.search(strPath) do |div|

end

if you capture the search results into a variable named "divs" you can index it like and array (because it is one)

divs=doc.search(strPath)

If you want to immediately start iterating you can do this:

doc.search(strPath).each_with_index do |div,idiv|
puts idiv if idiv==2
end

I work with hpricot a lot and I find it is more productive to not use all the fancy ruby idioms to shorten your code as you are dealing with pages that are very fragile to parse when someone changes the page content.

See code below

require 'hpricot'
require 'open-uri'

strLink ="http://www.sportsline.com/mlb/gamecenter/boxscore/MLB_20080331_ARI@CIN"
strPath ="//div[@class='SLTables1']/div"

doc = Hpricot(open(strLink))
divs=doc.search(strPath)

puts "#{divs[0].inner_text.slice(0..70)}\n\n"
puts "#{divs[1].inner_text.slice(0..70)}\n\n"
puts "#{divs[2].inner_text.slice(0..70)}\n\n"
puts "#{divs[3].inner_text.slice(0..70)}\n\n"

Topic		Replies	Views
Hpricot getting a table ruby-talk	4	87	18 April 2007
Html parsing with Hpricot ruby-talk	2	99	9 June 2010
Using HPricot to parse a fiddly table ruby-talk	2	133	7 January 2008
Hpricot parsing ruby-talk	5	158	20 April 2009
How to revise the programm to analyze web? ruby-talk	1	116	25 March 2010

Using hpricot to get tables

See code below

Related topics