An approach which can prove handy is to "screen scrape" the data
from the HTML. One of the easiest ways is with Firefox with the
Firebug add-on installed. With Firebug, you can inspect the elements
on the page, and view *formatted* source.
After you figure out how the data you are looking for is tagged,
or can be located, there are Ruby tools like Hpricot and Nokogiri
which allow one to quickly throw together an extraction routine.
For example, a few minutes ago, I did a Google search on "helium high
voice",
and came up with a few lines of code to extract the first page of
links
as follows:
1. I inspected the links, and found that they all seem to have
'class="l"'.
2. I copied the ugly source from a source-view window, and pasted it
into
scite (any editor would do), but in scite it's easy to view changes in
output
as you experiment.
3. I opened up a few lines, and pasted the HTML source under an
__END__
tag, which makes it available as the 'DATA' pseudo file.
4. I tried a couple of things using Nokigiri, and found something that
seemed
to work.
The code:
# coding: utf-8
require 'nokogiri'
html_doc = Nokogiri::HTML(DATA.read)
puts html_doc.css("a.l").collect{|el| el.attribute("href") }
__END__
(the ugly HTML page source goes here)
The output:
Why does the act of inhaling helium make your voice high-pitched?
Why does helium make your voice squeaky? - The Straight Dope
Helium - Wikipedia
http://answers.yahoo.com/question/index?qid=20060606123434AAxjX5A
http://blog.sciencegeekgirl.com/2009/03/26/myth-helium-makes-your-voice-high-pitched/
Why does helium change your voice? - Answers
LINTASTOTO - Live Draw Bandar Toto Macau Online Terpercaya 2023 Terbaru
http://www.hrwiki.org/wiki/helium
http://www.helium.com/items/1905495-why-does-helium-make-your-voice-squeaky
http://www.youtube.com/watch?v=Pq8sCwWEG9k
http://www.youtube.com/watch?v=MiZALF1VZe4
For production, just build the query and retrieve the page directly
to build the array of URLs.
Since there is no guarantee that Google won't tweak its technique
and break this particular code, having a very high level method
of page-scraping means that it wouldn't be hard to adjust. Moreover,
this technique can be used in many situations, and once you've done
a few sites, you'll find most applications are as easy as parsing XML
or adapting JSON from "data only" API's. After all, you get to see
exactly what data is available on the pages, which may include useful
things that an API might not make available.
···
On Nov 12, 9:48 am, Terry Michaels <cmhow...@frigidcode.com> wrote:
I'm writing this command-line ruby script and it needs to be able to
submit a search string and get back google result links. Remarkably, I
google this subject and I am finding Google APIs to do every possibly
thing imaginable except for this. The only thing I found was Goose which
is apparently based on an deprecated API.
I tried just looking at the actual HTML at the google home page, but
that is the most nasty mess of web code I've ever seen. Please tell me
somebody else has reversed engineered it so I don't have to...
--
Posted viahttp://www.ruby-forum.com/.