Hpricot problem on class initialize

7stud2 · 23 September 2013 17:04

Hello, im new to ruby so there is a good chance the error is not even
with hpricot but something im missing on the syntax or something.
I got the following code:
http://pastebin.com/Bfp7cTmy
Its suposed to read a file where I have a bunch of facebook IDs and
return a list of the links to those IDs along with the page title (so I
can see the name before clicking on the link)
The link part is working fine.
Thanks and sorry if its a silly question =)

···

--
Posted via http://www.ruby-forum.com/.

Tamara_Temple1 · 23 September 2013 20:28

What's the part that is *not* working?

···

On Sep 23, 2013, at 12:04 PM, Mario Me <lists@ruby-forum.com> wrote:

Hello, im new to ruby so there is a good chance the error is not even
with hpricot but something im missing on the syntax or something.
I got the following code:
require 'sinatra'require 'hpricot'require 'open-uri'set :server, 'webric - Pastebin.com
Its suposed to read a file where I have a bunch of facebook IDs and
return a list of the links to those IDs along with the page title (so I
can see the name before clicking on the link)
The link part is working fine.
Thanks and sorry if its a silly question =)

Ryan_Davis1 · 24 September 2013 05:25

The part where they're trying to use hpricot.

Use nokogiri instead. It's more correct in nearly every way.

···

On Sep 23, 2013, at 13:28 , Tamara Temple <tamouse.lists@gmail.com> wrote:

On Sep 23, 2013, at 12:04 PM, Mario Me <lists@ruby-forum.com> wrote:

Hello, im new to ruby so there is a good chance the error is not even
with hpricot but something im missing on the syntax or something.
I got the following code:
require 'sinatra'require 'hpricot'require 'open-uri'set :server, 'webric - Pastebin.com
Its suposed to read a file where I have a bunch of facebook IDs and
return a list of the links to those IDs along with the page title (so I
can see the name before clicking on the link)
The link part is working fine.
Thanks and sorry if its a silly question =)

What's the part that is *not* working?

7stud2 · 23 September 2013 20:41

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 24 September 2013 16:22

This is because I have known of sites (not sure if FB is like this or
not) that respond differently depending on specific contents of the
request header, which can be different between open-uri, curl, wget and
various browsers.

This may be the problem, wget returns a "unsuported browser" page and
curl a "page not found message".
Tried changing the user agent:
page = Nokogiri::HTML(open(site + link.to_s, 'User-Agent' =>
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)
Ubuntu '))
Still no success...

···

--
Posted via http://www.ruby-forum.com/\.

Tamara_Temple1 · 23 September 2013 21:08

From what I can see, you are sending page.link back to the client in index.erb. can you show the content that is sent back? (html source, please)

···

On Sep 23, 2013, at 3:41 PM, Mario Me <lists@ruby-forum.com> wrote:

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

Wayne_Brisette · 24 September 2013 16:29

I was wondering about that myself. I had a sneaky suspicion that FaceBook wouldn't allow you to "crawl" that site like that.

- Wayne

···

________________________________
From: Mario Me <lists@ruby-forum.com>
To: ruby-talk@ruby-lang.org
Sent: Tuesday, September 24, 2013 11:22 AM
Subject: Re: Hpricot problem on class initialize

This may be the problem, wget returns a "unsuported browser" page and
curl a "page not found message".

7stud2 · 24 September 2013 12:25

tamouse m. wrote in post #1122204:

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

From what I can see, you are sending page.link back to the client in
index.erb. can you show the content that is sent back? (html source,
please)

Im sorry but I think I dont undestood.
The page.link is working fine, the html on the index.erb is just a bunch
of links to the profiles.

Use nokogiri instead. It's more correct in nearly every way.

I tried this:

  def initialize(link)
    site = "Facebook;
    @link = site + link
    page = Nokogiri::HTML(open(site + link))
    @title = page.css("title").text
  end

And it still returns a 404 error "OpenURI::HTTPError at / 404 Not
Found".
Its strange because im using the same address to create the page.link
and it works!

···

On Sep 23, 2013, at 3:41 PM, Mario Me <lists@ruby-forum.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Tamara_Temple1 · 24 September 2013 14:48

tamouse m. wrote in post #1122204:

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

From what I can see, you are sending page.link back to the client in
index.erb. can you show the content that is sent back? (html source,
please)

Im sorry but I think I dont undestood.
The page.link is working fine, the html on the index.erb is just a bunch
of links to the profiles.

That's what *I* said.

Use nokogiri instead. It's more correct in nearly every way.

I tried this:

def initialize(link)
   site = "https://www.facebook.com/"
   @link = site + link
   page = Nokogiri::HTML(open(site + link))
   @title = page.css("title").text

Try this, just for me. Change the above two lines to this:

html_doc = Nokogiri::HTML(open(@link))
@title = html_doc.css("title").text

(Note you could put those on one line, like so:

@title = Nokigiri::HTML(open(@link)).css("title").text

)

Two things:

1) the variable page might be used elsewhere. making it unique here might help.
2) using the variable you just set, and will be using in the ERB, makes sure it is the same in the open.

end

And it still returns a 404 error "OpenURI::HTTPError at / 404 Not
Found".
Its strange because im using the same address to create the page.link
and it works!

Do this for me as well, from the command line, take one of those page.links from your output, and fetch it with either curl or wget.

This is because I have known of sites (not sure if FB is like this or not) that respond differently depending on specific contents of the request header, which can be different between open-uri, curl, wget and various browsers.

···

On Sep 24, 2013, at 7:25 AM, Mario Me <lists@ruby-forum.com> wrote:

On Sep 23, 2013, at 3:41 PM, Mario Me <lists@ruby-forum.com> wrote:

Topic		Replies	Views
Hpricot ruby-talk	10	88	18 August 2006
Please correct my Hpricot troubles ruby-talk	0	117	1 November 2008
Hpricot Help ruby-talk	0	109	25 August 2006
Hpricot problem ruby-talk	10	65	18 December 2006
Gems not working ruby-talk	6	76	28 June 2007

Hpricot problem on class initialize

Related topics