How to extract url's from html source of google search result

hi
  I want to make a Tk window where you give some input string and it
search that on google and prints the web address (http url) of the
result found on google in the TkFrame of that window. My program
connects to net and get the html source through function "http.get".
Now from html source , how can I find the url's of the search. Can i
do it by regular expression or any other way.
  Give me any suggestion.
Thanks
sujeet

The URI.extract method from the uri library can extract an array of uri's from
a string:

    require 'uri'
    URI.extract('My favorite site is http://google.com')
    # => ["http://google.com"]

An optional second argument can limit the schemes that it will match against
and return:

    URI.extract('Why do people use mailto:me@lala.org links?')
    # => ["mailto:me@lala.org"]
    URI.extract('Why do people use mailto:me@lala.org links?', 'http')
    # =>

marcel

···

On Sun, Jun 12, 2005 at 03:44:03AM +0900, sujeet kumar wrote:

  I want to make a Tk window where you give some input string and it
search that on google and prints the web address (http url) of the
result found on google in the TkFrame of that window. My program
connects to net and get the html source through function "http.get".
Now from html source , how can I find the url's of the search. Can i
do it by regular expression or any other way.
  Give me any suggestion.

--
Marcel Molina Jr. <marcel@vernix.org>

Why not use the Google API?

···

On 11 Jun 2005, at 11:44, sujeet kumar wrote:

hi
  I want to make a Tk window where you give some input string and it
search that on google and prints the web address (http url) of the
result found on google in the TkFrame of that window. My program
connects to net and get the html source through function "http.get".
Now from html source , how can I find the url's of the search. Can i
do it by regular expression or any other way.

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Marcel Molina Jr. wrote:

how can I find the url's of the search. Can i
do it by regular expression or any other way.
   

The URI.extract method from the uri library can extract an array of uri's from
a string:

A universal regexp that finds URIs from an abstract text is a complicated thing, indeed. Besides, it can produce false positives (finding things that look like URIs, but aren't).

If you are sure that the page is a well-formed XHTML (I'm not sure if that's the case or not with Google), you might instead parse it with REXML, and use XPath to retrieve href attributes of all <a>..</a> elements, selecting only those that start with "http://" (there may also be mailto:, ftp:, JavaScript calls etc).

Best regards,
Alexey Verkhovsky

···

On Sun, Jun 12, 2005 at 03:44:03AM +0900, sujeet kumar wrote: