Data extraction using Scrubyt

Hi All,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
  fetch 'http://www.ebay.in'
  fill_textfield 'satitle', 'ipod shuffle'
  submit
  record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
    name "/td[2]/div/a"
    price "/td[5]"
    image "/td/a/img" do
        url "src", :type => :attribute
    end
    link "/td[2]/div/a" do
        url "href", :type => :attribute
    end
  end
end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin

···

--
Posted via http://www.ruby-forum.com/.

You need to create smarter XPaths, relying on CSS id/class attributes or other properties rather than a full XPath from the root - for example:

require 'rubygems'
require 'scrubyt'

ebay_data = Scrubyt::Extractor.define do

      fetch 'Security Measure;
      fill_textfield 'satitle', 'ipod'
      submit

      record "//table[@class='nol']" do
        name "//td[@class='details']/div/a"
      end
end

puts ebay_data.to_xml

etc.

This way your scraper will be more robust and prone to page changes.

HTH,
Peter

···

___
http://www.rubyrailways.com
http://scrubyt.org

On 2008.12.05., at 8:02, Vipin Vm wrote:

Hi All,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.ebay.in'
fill_textfield 'satitle', 'ipod shuffle'
submit
record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
   name "/td[2]/div/a"
   price "/td[5]"
   image "/td/a/img" do
       url "src", :type => :attribute
   end
   link "/td[2]/div/a" do
       url "href", :type => :attribute
   end
end
end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin
--
Posted via http://www.ruby-forum.com/\.

Hi Peter,

Thanks for the Help... its working fine :slight_smile:

Vipin

Peter Szinek wrote:

···

You need to create smarter XPaths, relying on CSS id/class attributes
or other properties rather than a full XPath from the root - for
example:

require 'rubygems'
require 'scrubyt'

ebay_data = Scrubyt::Extractor.define do

      fetch 'Security Measure;
      fill_textfield 'satitle', 'ipod'
      submit

      record "//table[@class='nol']" do
        name "//td[@class='details']/div/a"
      end
end

puts ebay_data.to_xml

etc.

This way your scraper will be more robust and prone to page changes.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

--
Posted via http://www.ruby-forum.com/\.

Glad that I could help. I am just working on a new release btw, so stay tuned!

Cheers,
Peter

···

On 2008.12.06., at 4:46, Vipin Vm wrote:

Hi Peter,

Thanks for the Help... its working fine :slight_smile:

___
http://www.rubyrailways.com
http://scrubyt.org