New to ruby.. need a code

rati_lion · 1 April 2006 11:03

Hi all

I am new to Ruby. Found it as an intersting language. Can anyone help
me with a simple code in Ruby to check for all the dead and live links
in a website ?

Thanks
Rati

Patrick_Hurley1 · 1 April 2006 14:39

A wonderful language, a sort of rough request here is some (very)
simple code to get you started, using some great ruby libraries,
including rubyful soup (http://www.crummy.com/software/RubyfulSoup/\),
which is available as a gem. As written it only checks one page, you
would need to make it "walk" the links to recursively check a whole
site.

Hope it helps
pth

require 'open-uri'
require 'uri'
require 'rubyful_soup'

url = 'http://www.yahoo.com/'
uri = URI.parse(url)
html = open(uri).read
soup = BeautifulSoup.new(html)

#Search the soup
links = soup.find_all('a').map { |a| a['href'] }

# Remove javascript
links.delete_if { |href| href =~ /javascript/ }

links.each do |l|
  # resolve relative paths (there is probably a better way)
  link = URI.parse(l)
  link.scheme = 'http' unless link.scheme
  link.host = uri.host unless link.host
  link.path = uri.path + link.path unless link.path[0] == ?/
  link = URI.parse(link.to_s)

  # check the link
  begin
    open(link).read
    # if we made it here the link is probably good
  rescue Exception => e
    puts "#{link.to_s}: #{e.to_s}"
  end
end

···

On 4/1/06, rati_lion@yahoo.com <rati_lion@yahoo.com> wrote:

Hi all

I am new to Ruby. Found it as an intersting language. Can anyone help
me with a simple code in Ruby to check for all the dead and live links
in a website ?

Thanks
Rati

rati_lion · 1 April 2006 16:18

Thank you. Will take care in future to not post a rough request

Thanks again for this extended help.

David_Vallner · 1 April 2006 20:06

Dňa Sobota 01 Apríl 2006 16:39 Patrick Hurley napísal:

···

On 4/1/06, rati_lion@yahoo.com <rati_lion@yahoo.com> wrote:
> Hi all
>
> I am new to Ruby. Found it as an intersting language. Can anyone help
> me with a simple code in Ruby to check for all the dead and live links
> in a website ?
>
> Thanks
> Rati

A wonderful language, a sort of rough request here is some (very)
simple code to get you started, using some great ruby libraries,
including rubyful soup (http://www.crummy.com/software/RubyfulSoup/\),
which is available as a gem. As written it only checks one page, you
would need to make it "walk" the links to recursively check a whole
site.

Hope it helps
pth

require 'open-uri'
require 'uri'
require 'rubyful_soup'

url = 'http://www.yahoo.com/'
uri = URI.parse(url)
html = open(uri).read
soup = BeautifulSoup.new(html)

#Search the soup
links = soup.find_all('a').map { |a| a['href'] }

# Remove javascript
links.delete_if { |href| href =~ /javascript/ }

links.each do |l|
  # resolve relative paths (there is probably a better way)
  link = URI.parse(l)
  link.scheme = 'http' unless link.scheme
  link.host = uri.host unless link.host
  link.path = uri.path + link.path unless link.path[0] == ?/
  link = URI.parse(link.to_s)

  # check the link
  begin
    open(link).read
    # if we made it here the link is probably good
  rescue Exception => e
    puts "#{link.to_s}: #{e.to_s}"
  end
end

Yay for intuitive and descriptive class and library names, eh?

David Vallner

Robin_Stocker1 · 2 April 2006 01:55

Hi,

links.each do |l|
# resolve relative paths (there is probably a better way)

Yes there is, see below

  link = URI.parse(l)
  link.scheme = 'http' unless link.scheme
  link.host = uri.host unless link.host
  link.path = uri.path + link.path unless link.path[0] == ?/
  link = URI.parse(link.to_s)

link = uri.merge(l)

Robin

Patrick_Hurley1 · 2 April 2006 00:41

You are very welcome. And you are always welcome to request help. So I
don't come off sounding too harsh, let me explain. This is a friendly
community, generally populated by programmers and people striving to
become programmers, so when asking for assistance in this manner, you
will often receive more assistance when you show that you have spent a
little time attempting to do this yourself, or asking for guidance on
how to get started.

As for the name rubyful_soup there are other better named libraries,
but this is based upon a Python library (same author) called Beautiful
Soup -- its claim to fame is that it will handle invalid markup, that
will give most other parsers fits (so it is useful in the real world
when dealing with other peoples "html"). I knew the library because I
had read about it here.

Good luck again
pth

···

On 4/1/06, rati_lion@yahoo.com <rati_lion@yahoo.com> wrote:

Thank you. Will take care in future to not post a rough request

Thanks again for this extended help.

Patrick_Hurley1 · 2 April 2006 02:59

Thanks I knew there had to be :-), I don't use URI much.

pth

···

On 4/1/06, Robin Stocker <robin@nibor.org> wrote:

Hi,

> links.each do |l|
> # resolve relative paths (there is probably a better way)
Yes there is, see below

> link = URI.parse(l)
> link.scheme = 'http' unless link.scheme
> link.host = uri.host unless link.host
> link.path = uri.path + link.path unless link.path[0] == ?/
> link = URI.parse(link.to_s)
link = uri.merge(l)

Robin

Topic		Replies	Views
Waiter, there's a noob in my soup! ruby-talk	14	145	29 March 2006
Checking that an URL exists ruby-talk	9	127	26 May 2004
Spidering a website to build a sitemap ruby-talk	16	117	1 July 2005
Some problems with URI.extract? ruby-talk	5	142	28 May 2004
Script to visit a page ruby-talk	2	103	7 July 2006

New to ruby.. need a code

Related topics