Code duplication

Hi all,
       The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
    # if the url is redirecting then fetch the contents of the
redirected url
    when Net::HTTPRedirection then uri = URI.parse(response['Location'])
                                   response =
Net::HTTP.get_response(uri)
  # in case of a bad request error
  when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
  #getting the html data by setting the path as '/' and using a user
agent
  response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

···

--
Posted via http://www.ruby-forum.com/.

Arun Kumar wrote:

Hi all,
       The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
    # if the url is redirecting then fetch the contents of the
redirected url
    when Net::HTTPRedirection then uri = URI.parse(response['Location'])
                                   response =
Net::HTTP.get_response(uri)
  # in case of a bad request error
  when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
  #getting the html data by setting the path as '/' and using a user
agent
  response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

What is the use of this below statement ?
         response = http.get("/", "User-Agent"=>"Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)")

Since you had already got the response object using get_response, then
why it is needed?

···

--
Posted via http://www.ruby-forum.com/\.

There are probably better solutions, but the following illustrates the point your tutor is making:

MOZILLA_HEADER = { "User-Agent"=>"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" }

def get_http_response uri, max_redirects = 0
   Net::HTTP.start(uri) do |connection|
     response = connection.get(uri.path, MOZILLA_HEADER)
     response &&= case response
     when Net::HTTPRedirection
       if max_redirects > 0 then
         get_http_response URI.parse(response['Location']), (max_redirects - 1)
       else
         raise "Too many redirects"
       end
     when Net::HTTPRedirection
       get_http_response URI.parse("http://#{uri.host}:#{uri.port}/"), max_redirects
     end
   end
end

data = get_http_response(my_uri, 3).body

See how get_http_response is recursive in the case of an erroneous response? This minimises the actual HTTP interaction code as well as elegantly handling redirects. Whilst this could result in many more http connections being used, it also makes them clear up after themselves which is always good.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 6 Apr 2009, at 14:50, Arun Kumar wrote:

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

----
raise ArgumentError unless @reality.responds_to? :reason

Loga Ganesan wrote:

Arun Kumar wrote:

Hi all,
       The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
    # if the url is redirecting then fetch the contents of the
redirected url
    when Net::HTTPRedirection then uri = URI.parse(response['Location'])
                                   response =
Net::HTTP.get_response(uri)
  # in case of a bad request error
  when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
  #getting the html data by setting the path as '/' and using a user
agent
  response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

What is the use of this below statement ?
         response = http.get("/", "User-Agent"=>"Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)")

Since you had already got the response object using get_response, then
why it is needed?

Hi,
Thanks for the reply. If it is a bad request error, then I have to
communicate to the port and host and then I've to fetch the data. For
eg. if i try to fetch html contents from youtube.com, i get a bad
request error. So I used the Net::HTTP.start() and then I used the path
and user agent to retreive the contents and stored it in response. I
dont think that there is any other way. If I remove that part, I'm not
able to read the html.

Thanks
Arun

···

--
Posted via http://www.ruby-forum.com/\.

There are probably better solutions, but the following illustrates the
point your tutor is making:

MOZILLA_HEADER = { "User-Agent"=>"Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)" }

def get_http_response uri, max_redirects = 0
   Net::HTTP.start(uri) do |connection|
     response = connection.get(uri.path, MOZILLA_HEADER)
     response &&= case response
     when Net::HTTPRedirection
       if max_redirects > 0 then
         get_http_response URI.parse(response['Location']),
(max_redirects - 1)
       else
         raise "Too many redirects"
       end
     when Net::HTTPRedirection
       get_http_response URI.parse("http://#{uri.host}:#{uri.port}/"),
max_redirects
     end
   end
end

data = get_http_response(my_uri, 3).body

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.
Regards
Arun

···

--
Posted via http://www.ruby-forum.com/\.

My pleasure :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 7 Apr 2009, at 05:31, Arun Kumar wrote:

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.

----
raise ArgumentError unless @reality.responds_to? :reason

Eleanor McHugh wrote:

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.

My pleasure :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
----
raise ArgumentError unless @reality.responds_to? :reason

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I want to
share some doubt with you.
1) How can i specify the redirect limit without declaring it inside a
method. Is it possible?
2) By including a redirect limit, will I be able to make the code for
url redirection the most effective one or should i include some aditions
to the code to handle redirection effectively?

Thanks
Arun

···

On 7 Apr 2009, at 05:31, Arun Kumar wrote:

--
Posted via http://www.ruby-forum.com/\.

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I want to
share some doubt with you.
1) How can i specify the redirect limit without declaring it inside a
method. Is it possible?

The redirect limit isn't declared inside the method but as one of the parameters of the method, which is why it allows recursive execution as each redirect is received. You'll note that I provided an initial value as part of the initial functional call:

  data = get_http_response(my_uri, 3).body

but in a real-world program you either specify a constant and use that:

  MAXIMUM_REDIRECTS = 3
  data = get_http_response(MAXIMUM_REDIRECTS, 3).body

or else wrap everything together into an object where this value would be either an instance or class variable depending on your intent.

2) By including a redirect limit, will I be able to make the code for
url redirection the most effective one or should i include some aditions
to the code to handle redirection effectively?

I can't really answer that question without knowing more about the real-world problem you're trying to solve. However in general I'd say that whenever you have a recursive problem like this it's sensible to ensure that it's throttled to prevent resource exhaustion. For a very graphic example of why this is important - especially with network applications - read up on the Morris Worm :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 7 Apr 2009, at 12:18, Arun Kumar wrote:
----
raise ArgumentError unless @reality.responds_to? :reason