Hpricot & mechanize fail to parse page after redirect

Hi everyone,
My quest with mechanize/Hpricot continues :slight_smile:
Something extremely strange happened today - some simple working code
broke down, and i can't figure out why.

I am trying to access a piratebay.org search page, which does a redirect
to a relative url like this:
original link:
http://thepiratebay.org/s/?page=0&orderby=3&q=football+manager+2008&searchTitle=on

redirects to:
/search/football manager 2008/0/3/0

Now, this all worked dandily up till yesterday. the page was redirected,
and mechanize even handled the cookie that was sent back from the site.
But today, i am getting this strange error:
"URI::InvalidURIError: bad URI(is not URI?): /search/football manager
2008/0/3/0"
from Hpricot. Mechanize gives a different one, but i'm sure it's
inherited from hpricot's problem with getting the page.

I have tested this on 2 different machines, and they both break down.
Can someone please give it a go and see if they can figure it out?
I would be very very thankful :slight_smile:

Thanks,
Ehud

PS - I am using hpricot 0.6, and the redirected page is parsed correctly
when accessed directly

路路路

--
Posted via http://www.ruby-forum.com/.

If the redirect is via a 302 with a Location: header that is just the:
"/search/football manager 2008/0/3/0"

it's probably similar to the issue I had using HTTPClient. The relevant bit of code from HTTPClient is:
   def default_redirect_uri_callback(uri, res)
     newuri = URI.parse(res.header['location'][0])
     unless newuri.is_a?(URI::HTTP)
       newuri = URI.join(uri, newuri)
       STDERR.puts(
         "could be a relative URI in location header which is not recommended")
       STDERR.puts(
         "'The field value consists of a single absolute URI' in HTTP spec")
     end
     puts "Redirect to: #{newuri}" if $DEBUG
     newuri
   end

Note the line: URI.join(uri, newuri) which takes the (presumed) relative newuri and interprets it with respect to the original uri. (Note also that I've recently sent the author of httpclient a patch that fixed this line.)

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

路路路

On Nov 14, 2007, at 2:17 PM, Ehud Rosenberg wrote:

Hi everyone,
My quest with mechanize/Hpricot continues :slight_smile:
Something extremely strange happened today - some simple working code
broke down, and i can't figure out why.

I am trying to access a piratebay.org search page, which does a redirect
to a relative url like this:
original link:
Download music, movies, games, software! The Pirate Bay - The galaxy's most resilient BitTorrent site

redirects to:
/search/football manager 2008/0/3/0

Now, this all worked dandily up till yesterday. the page was redirected,
and mechanize even handled the cookie that was sent back from the site.
But today, i am getting this strange error:
"URI::InvalidURIError: bad URI(is not URI?): /search/football manager
2008/0/3/0"
from Hpricot. Mechanize gives a different one, but i'm sure it's
inherited from hpricot's problem with getting the page.

I have tested this on 2 different machines, and they both break down.
Can someone please give it a go and see if they can figure it out?
I would be very very thankful :slight_smile:

Thanks,
Ehud

PS - I am using hpricot 0.6, and the redirected page is parsed correctly
when accessed directly

That is probably the case when using Hpricot - but mechanize handles
this and has a method that takes a relative url redirect and creates a
fully qualified one.
Also it worked for me yesterday with the exact same code (I know that
sounds crazy! :slight_smile:

Thanks for the quick and thorough reply bob!

路路路

--
Posted via http://www.ruby-forum.com/.