I'm not at all clear what the *specific* things are that you want to
extract from the website.
In any case, you need to click on View/Source in your browser and
examine the raw html to figure out what tags you need to extract and how
to identify them. Look at the web page in your browser then use Find or
Search to locate the same text in the raw html.
Then read some basic xpath tutorials starting here:
http://www.engineyard.com/blog/2010/getting-started-with-nokogiri/
Here is an example of how to get the names of the restaurants:
require 'nokogiri'
#require 'open-uri'
#doc = Nokogiri::HTML(open("http://www.threescompany.com/"))
html =<<MY_HTML
<html>
<head>
<title>Stuff</title>
</head>
<body>
<h3 class="title fn org">
<a href="http://blah_blah_blah"
class="no-tracks url "
rel="nofollow"
title="Fishermen's Grotto">Fishermen's Grotto</a>
</h3>
<junk>blah blah blah</junk>
<h3 class="title fn org">
<a href="http:/blah_blah
rel="nofollow"
title="Marnee Thai Restaurant">Marnee Thai Restaurant</a>
</h3>
</body>
</html>
MY_HTML
doc = Nokogiri::HTML(html)
doc.xpath('//h3[@class="title fn org"]/a[1]').each do |node|
puts node.text
end
--output:--
Fishermen's Grotto
Marnee Thai Restaurant
Parsing html requires a good understanding of html structure, e.g.
parents, children, siblings, etc., and css, e.g. classes, ids, etc. As
a beginner it is better to take baby steps, not jump in the deep end of
the pool, so this project may be too hard for you.
···
--
Posted via http://www.ruby-forum.com/.