Hpricot problem on class initialize

Hello, im new to ruby so there is a good chance the error is not even
with hpricot but something im missing on the syntax or something.
I got the following code:
http://pastebin.com/Bfp7cTmy
Its suposed to read a file where I have a bunch of facebook IDs and
return a list of the links to those IDs along with the page title (so I
can see the name before clicking on the link)
The link part is working fine.
Thanks and sorry if its a silly question =)

···

--
Posted via http://www.ruby-forum.com/.

What's the part that is *not* working?

···

On Sep 23, 2013, at 12:04 PM, Mario Me <lists@ruby-forum.com> wrote:

Hello, im new to ruby so there is a good chance the error is not even
with hpricot but something im missing on the syntax or something.
I got the following code:
require 'sinatra'require 'hpricot'require 'open-uri'set :server, 'webric - Pastebin.com
Its suposed to read a file where I have a bunch of facebook IDs and
return a list of the links to those IDs along with the page title (so I
can see the name before clicking on the link)
The link part is working fine.
Thanks and sorry if its a silly question =)

The part where they're trying to use hpricot.

Use nokogiri instead. It's more correct in nearly every way.

···

On Sep 23, 2013, at 13:28 , Tamara Temple <tamouse.lists@gmail.com> wrote:

On Sep 23, 2013, at 12:04 PM, Mario Me <lists@ruby-forum.com> wrote:

Hello, im new to ruby so there is a good chance the error is not even
with hpricot but something im missing on the syntax or something.
I got the following code:
require 'sinatra'require 'hpricot'require 'open-uri'set :server, 'webric - Pastebin.com
Its suposed to read a file where I have a bunch of facebook IDs and
return a list of the links to those IDs along with the page title (so I
can see the name before clicking on the link)
The link part is working fine.
Thanks and sorry if its a silly question =)

What's the part that is *not* working?

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

···

--
Posted via http://www.ruby-forum.com/.

This is because I have known of sites (not sure if FB is like this or
not) that respond differently depending on specific contents of the
request header, which can be different between open-uri, curl, wget and
various browsers.

This may be the problem, wget returns a "unsuported browser" page and
curl a "page not found message".
Tried changing the user agent:
    page = Nokogiri::HTML(open(site + link.to_s, 'User-Agent' =>
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)
Ubuntu '))
Still no success...

···

--
Posted via http://www.ruby-forum.com/\.

From what I can see, you are sending page.link back to the client in index.erb. can you show the content that is sent back? (html source, please)

···

On Sep 23, 2013, at 3:41 PM, Mario Me <lists@ruby-forum.com> wrote:

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

I was wondering about that myself. I had a sneaky suspicion that FaceBook wouldn't allow you to "crawl" that site like that.

- Wayne

···

________________________________
From: Mario Me <lists@ruby-forum.com>
To: ruby-talk@ruby-lang.org
Sent: Tuesday, September 24, 2013 11:22 AM
Subject: Re: Hpricot problem on class initialize

This may be the problem, wget returns a "unsuported browser" page and
curl a "page not found message".

tamouse m. wrote in post #1122204:

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

From what I can see, you are sending page.link back to the client in
index.erb. can you show the content that is sent back? (html source,
please)

Im sorry but I think I dont undestood.
The page.link is working fine, the html on the index.erb is just a bunch
of links to the profiles.

Use nokogiri instead. It's more correct in nearly every way.

I tried this:

  def initialize(link)
    site = "Facebook;
    @link = site + link
    page = Nokogiri::HTML(open(site + link))
    @title = page.css("title").text
  end

And it still returns a 404 error "OpenURI::HTTPError at / 404 Not
Found".
Its strange because im using the same address to create the page.link
and it works!

···

On Sep 23, 2013, at 3:41 PM, Mario Me <lists@ruby-forum.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

tamouse m. wrote in post #1122204:

When I try to get the page with hpricot
doc = Hpricot(open(@link))
I always have a 404 error, but if I print the @link variable to the view
I can access the URL just fine. So I guess its the way im trying to open
it with hpricot that is wrong.

From what I can see, you are sending page.link back to the client in
index.erb. can you show the content that is sent back? (html source,
please)

Im sorry but I think I dont undestood.
The page.link is working fine, the html on the index.erb is just a bunch
of links to the profiles.

That's what *I* said.

Use nokogiri instead. It's more correct in nearly every way.

I tried this:

def initialize(link)
   site = "https://www.facebook.com/&quot;
   @link = site + link
   page = Nokogiri::HTML(open(site + link))
   @title = page.css("title").text

Try this, just for me. Change the above two lines to this:

    html_doc = Nokogiri::HTML(open(@link))
    @title = html_doc.css("title").text

(Note you could put those on one line, like so:

    @title = Nokigiri::HTML(open(@link)).css("title").text

)

Two things:

1) the variable page might be used elsewhere. making it unique here might help.
2) using the variable you just set, and will be using in the ERB, makes sure it is the same in the open.

end

And it still returns a 404 error "OpenURI::HTTPError at / 404 Not
Found".
Its strange because im using the same address to create the page.link
and it works!

Do this for me as well, from the command line, take one of those page.links from your output, and fetch it with either curl or wget.

This is because I have known of sites (not sure if FB is like this or not) that respond differently depending on specific contents of the request header, which can be different between open-uri, curl, wget and various browsers.

···

On Sep 24, 2013, at 7:25 AM, Mario Me <lists@ruby-forum.com> wrote:

On Sep 23, 2013, at 3:41 PM, Mario Me <lists@ruby-forum.com> wrote: