OpenURI open method problem

The code I am referring to looks like this:

   def open_url(url)
     url_object = nil
     begin
       url_object = open(url)
     rescue
       puts "Unable to open url: " + url
     end
     return url_object
   end

I was wondering why the open method is unable to open the url when the
urls are of this form:

http://www.anrdoezrs.net/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK

http://www.jdoqocy.com/click-5329913-10538037?url=http://www.6pm.com/rock-n-roll-cowgirl-woven-tunic-red

even though they clearly work if you put it into your browser. I've read
that open uri automatically follows redirects, so what is the problem
here?

Thanks

···

--
Posted via http://www.ruby-forum.com/.

Who says those pages are doing redirects? What if the first web page
parses the query string attached to the url, then uses javascript to
load a different page?

···

--
Posted via http://www.ruby-forum.com/.

If that's the case, is there a way I can follow the link all the way
through so I can what I want?

···

--
Posted via http://www.ruby-forum.com/.

Or good old curl.

$ curl -v

Neat.

···

--
Posted via http://www.ruby-forum.com/\.

Still, the op's question remains unanswered: why doesn't open-uri follow
all those redirects?

Looking at the curl -Lv output, there are cookies being set. Maybe a
server side script kicks you out if the requests for the redirect urls
do not include those cookies.

One option is to switch to Mechanize, which will automatically send any
cookies that were set in a response.

···

--
Posted via http://www.ruby-forum.com/.

Thanks guys,

I just used mechanize and it works great

···

--
Posted via http://www.ruby-forum.com/.

Still, the op's question remains unanswered: why doesn't open-uri follow
all those redirects?

it can:

According to the docs, open-uri follows redirects by default. So
according to the docs, open-uri can follow redirects, but the fact
remains it doesn't in this case. Why?

···

--
Posted via http://www.ruby-forum.com/\.

If the first page is redirecting using Javascript client side perhaps you
should consider using PhantomJS through the phantomjs.rb gem or other
equivalent mean.

To assert before which way the redirection actually happens, I'd try with a
browser plugin like Live HTTP Headers or good old Wireshark.

Marvan

···

On Thu, Sep 6, 2012 at 7:33 PM, Derek T. <lists@ruby-forum.com> wrote:

If that's the case, is there a way I can follow the link all the way
through so I can what I want?

--
Posted via http://www.ruby-forum.com/\.

it can:

% ri OpenURI::OpenRead#open | grep -A8 :redirect:
:redirect:
  Synopsis:
    :redirect=>bool

:redirect=>false is used to disable HTTP redirects at all.
OpenURI::HTTPRedirect exception raised on redirection. It is true by default.
The true means redirections between http and ftp is permitted.

···

On Sep 6, 2012, at 17:21 , 7stud -- <lists@ruby-forum.com> wrote:

Still, the op's question remains unanswered: why doesn't open-uri follow
all those redirects?

Still, the op's question remains unanswered: why doesn't open-uri follow
all those redirects?

it can:

According to the docs, open-uri follows redirects by default. So
according to the docs, open-uri can follow redirects, but the fact
remains it doesn't in this case. Why?

Because it redirects with invalid URI's:

Last login: Fri Sep 7 23:45:55 on ttys008
10000 % ruby -ropen-uri -e 'URI.parse(ARGV.shift).read' "OOPS! The offer you're looking for has expired.;
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): OOPS! The offer you're looking for has expired.; (URI::InvalidURIError)
  from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:485:in `parse'
...
  from -e:1

10001 % curl -I !$
curl -I "OOPS! The offer you're looking for has expired.;
HTTP/1.1 302 Found
Server: Resin/3.1.8
P3P: policyref="http://www.jdoqocy.com/w3c/p3p.xml&quot;, CP="ALL BUS LEG DSP COR ADM CUR DEV PSA OUR NAV INT"
Cache-control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Expires: Sat, 08 Sep 2012 06:48:03 GMT
Location: OOPS! The offer you're looking for has expired.;
Content-Type: text/html
Connection: close

···

On Sep 7, 2012, at 21:26 , 7stud -- <lists@ruby-forum.com> wrote:
Date: Sat, 08 Sep 2012 06:48:03 GMT

Reza Marvan Spagnolo wrote in post #1074957:

To assert before which way the redirection actually happens, I'd try
with a
browser plugin like Live HTTP Headers or good old Wireshark.

Marvan

Or good old curl.

$ curl -v
'http://www.anrdoezrs.net/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK&#39;
* About to connect() to www.anrdoezrs.net port 80 (#0)
* Trying 89.207.18.129... connected
* Connected to www.anrdoezrs.net (89.207.18.129) port 80 (#0)

GET

/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK
HTTP/1.1

User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4

OpenSSL/0.9.8r zlib/1.2.5

Host: www.anrdoezrs.net
Accept: */*

< HTTP/1.1 302 Found
< Server: Resin/3.1.8
< P3P: policyref="http://www.anrdoezrs.net/w3c/p3p.xml&quot;, CP="ALL BUS LEG
DSP COR ADM CUR DEV PSA OUR NAV INT"
< Cache-control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
< Pragma: no-cache
< Expires: Thu, 06 Sep 2012 19:27:18 GMT
< Location:
OOPS! The offer you're looking for has expired.;
< Content-Type: text/html
< Cneonction: close
< Transfer-Encoding: chunked
< Date: Thu, 06 Sep 2012 19:27:18 GMT
<
<html>
<head><meta http-equiv="redirect"
content="OOPS! The offer you're looking for has expired.;
<body>The URL has moved <a
href="OOPS! The offer you're looking for has expired.;

* Connection #0 to host www.anrdoezrs.net left intact
* Closing connection #0

That's a bog-standard 302 redirect. However, curl -Lv shows that there
is a chain of *three* redirects before the final page is reached.

···

--
Posted via http://www.ruby-forum.com/\.

+1 for curl obviously ... sorry was in overkill mode .. :slight_smile:

···

On Thu, Sep 6, 2012 at 9:30 PM, Brian Candler <lists@ruby-forum.com> wrote:

Reza Marvan Spagnolo wrote in post #1074957:
> To assert before which way the redirection actually happens, I'd try
> with a
> browser plugin like Live HTTP Headers or good old Wireshark.
>
> Marvan

Or good old curl.

$ curl -v
'
OOPS! The offer you're looking for has expired.
'
* About to connect() to www.anrdoezrs.net port 80 (#0)
* Trying 89.207.18.129... connected
* Connected to www.anrdoezrs.net (89.207.18.129) port 80 (#0)
> GET
/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com
%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK
HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4
OpenSSL/0.9.8r zlib/1.2.5
> Host: www.anrdoezrs.net
> Accept: */*
>
< HTTP/1.1 302 Found
< Server: Resin/3.1.8
< P3P: policyref="http://www.anrdoezrs.net/w3c/p3p.xml&quot;, CP="ALL BUS LEG
DSP COR ADM CUR DEV PSA OUR NAV INT"
< Cache-control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
< Pragma: no-cache
< Expires: Thu, 06 Sep 2012 19:27:18 GMT
< Location:

OOPS! The offer you're looking for has expired.
<<iuuq%3A%2F%2Fxxx.bosepf0st.ofu%3A91%2Fdmjdl-643AA24-2167A127<<H<<
< Content-Type: text/html
< Cneonction: close
< Transfer-Encoding: chunked
< Date: Thu, 06 Sep 2012 19:27:18 GMT
<
<html>
<head><meta http-equiv="redirect"
content="
OOPS! The offer you're looking for has expired.
;"></head>
<body>The URL has moved <a
href="
OOPS! The offer you're looking for has expired.
;">here</a></body></html>

* Connection #0 to host www.anrdoezrs.net left intact
* Closing connection #0

That's a bog-standard 302 redirect. However, curl -Lv shows that there
is a chain of *three* redirects before the final page is reached.

--
Posted via http://www.ruby-forum.com/\.

--
*Reza Marvan Spagnolo
*@ :: marvans@gmail.com
m :: + 49 174 9146062
skype :: mrvspg