Parsing XML with Ruby

I need to hit an https link and pass a username and password in order to
pull down some records in xml format. I was thinking that the easiest
way to do this is to shell out to curl and then parse my xml provided
that I could pass the username/password in the url.

Can anyone recommend an easy way to accomplish this?

thanks

jackster

···

--
Posted via http://www.ruby-forum.com/.

The ruby way of doing this would be to use the Net::HTTP from the
standard library. It does https too. The docs are at:

  http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html

As for parsing XML, there are a few options, but Nokogiri is probably
the easiest to learn and deal with. Easily googlable, and there are a
few blog posts with samples.

HTH, and have fun!
Ammar

···

On Fri, Nov 12, 2010 at 2:12 AM, jackster the jackle <johnsheahan@att.net> wrote:

I need to hit an https link and pass a username and password in order to
pull down some records in xml format. I was thinking that the easiest
way to do this is to shell out to curl and then parse my xml provided
that I could pass the username/password in the url.

Can anyone recommend an easy way to accomplish this?

thanks alot for the advise. I have used Net::HTTP alot in the past but
could never get it working with HTTPS but I'll read the docs again and
have at it....

jack

···

--
Posted via http://www.ruby-forum.com/.

It seems to be working...here is my test code:

http = Net::HTTP.new('www.chase.com', 443) # note the port
http.use_ssl = true # turn it on
puts "Here is the page: #{http} "

The only problem is, I get this returned:

    Here is the page: #<Net::HTTP:0x31b31d8>

That looks like an array, how do I get the data out of it, should I use
an "each do" statement?

thanks

jack

···

--
Posted via http://www.ruby-forum.com/.

I'm not a developer..I'm a network engineer and have been for 15
years...I have also been a member of this forum (off and on) for the
last 5 years or so.

Believe me, I google everything and try and figure it out before I post
on the forum....I'm sorry, I don't understand everything I read in the
docs like you guys do...much of it is a foreign language to me.

I have been able to get some things working over the years by trial and
error and by modifying and expanding some base snippets of code...much
of the help I received on this forum by people who were willing to help
even dummies like me.

If someone I didn't know asked me networking questions I wouldn't tell
them to RTFM...that is what know it alls do.

I appreciate any and all help from Ammar and I'm sorry I got upset but I
can't deal with people that deliberately hold back info to try and teach
people some kind of lesson.

···

--
Posted via http://www.ruby-forum.com/.

The difference between http and https with Net:HTTP can be summed by:

require 'net/https' # extra require

http = Net::HTTP.new('server.net', 443) # note the port
http.use_ssl = true # turn it on

Regards,
Ammar

···

On Fri, Nov 12, 2010 at 3:49 AM, jackster the jackle <johnsheahan@att.net> wrote:

thanks alot for the advise. I have used Net::HTTP alot in the past but
could never get it working with HTTPS but I'll read the docs again and
have at it....

It seems to be working...here is my test code:

http = Net::HTTP.new('www.chase.com', 443) # note the port
http.use_ssl = true # turn it on
puts "Here is the page: #{http} "

The only problem is, I get this returned:

Here is the page: #<Net::HTTP:0x31b31d8>

That's not the page. That's the instance of Net:HTTP, the we client,
which you have to use to fetch the page.

That looks like an array, how do I get the data out of it, should I use
an "each do" statement?

Please read the documentation to find out how to use the instance you created.

Good luck,
Ammar

···

On Fri, Nov 12, 2010 at 4:21 AM, jackster the jackle <johnsheahan@att.net> wrote:

It seems to be working...here is my test code:

http = Net::HTTP.new('www.chase.com', 443) # note the port
http.use_ssl = true # turn it on
puts "Here is the page: #{http} "

just continue... do not be afraid to try...

response = http.get('/index.html')
#=> #<Net::HTTPOK 200 OK readbody=true>

response.body
#=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"
\"http://www.w3.org/TR/html4/strict.dtd\&quot;&gt;\\n&lt;html
xmlns:xalan=\"http://xml.apache.org/xalan\&quot;
xmlns:java=\"http://xml.apache.org/xslt/java\&quot; LANG=\"EN\"><head><META
http-equiv=\"Content-Type\" content=\"text/html;
charset=UTF-8\"><title>CHASE Home: Personal Banking | Personal Lending

Retirement &amp; Investing | Business

Banking</title><script>\n\t\t\t\t\t\tvar pageId = '/online....

<text snipped>
.......

hth
kind regards -botp

···

On Fri, Nov 12, 2010 at 10:21 AM, jackster the jackle <johnsheahan@att.net> wrote:

Ammar Ali wrote in post #960938:

Please read the documentation to find out how to use the instance you
created.

Good luck,
Ammar

What, have you suddenly decided to play the "school master"? I can't
stand people like you who think they are smarter than everyone else and
pretend to help by giving stupid little hints....get lost Ammar

···

--
Posted via http://www.ruby-forum.com/\.

botp wrote in post #960964:

response = http.get('/index.html')
#=> #<Net::HTTPOK 200 OK readbody=true>

response.body

That does help and thank you.
Prior to your post, I was able to get the following working which is
similar to what you gave me:

···

---------
uri = URI.parse("https://www.chase.com/&quot;\)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
puts response.body
response["header-here"] # All headers are lowercase
----------
I am able to pull the page from Chase but the https url that I really
need requires a one time username and password which I am having trouble
with.

I tried modifying the uri to be:
---------
uri = URI.parse("https://myusername:mypassword@www.myurl.com/&quot;\)
---------

but that didn't work so I tried adding this which I got from Google:
---------
http.basic_auth("username", "password")
---------

Am I proceeding down the right path with the authentication?

thanks

jack

--
Posted via http://www.ruby-forum.com/\.

Ammar Ali wrote in post #960938:

Please read the documentation to find out how to use the instance you
created.

Good luck,
Ammar

What, have you suddenly decided to play the "school master"? I can't
stand people like you who think they are smarter than everyone else and
pretend to help by giving stupid little hints....

In an earlier post you wrote: "I have used Net::HTTP alot in the
past". Your last question clearly indicates that you are not telling
the truth. And it shows that you didn't even try to read the docs. I
was willing to help, but I'm not willing to do the work for you.

get lost Ammar

With pleasure.

Good luck,
Ammar

···

On Fri, Nov 12, 2010 at 2:06 PM, jackster the jackle <johnsheahan@att.net> wrote:

There is this really cool tool that all us programmers know about but
we never tell the noobs. But as you asked so nice I will tell you. Its
called google :slight_smile:

Try this query "read an https url in ruby", look at the first result.
Looks interesting?

As you have displayed no ability to solve your own problems and an
inflated sense of your own importance I think that you will probably
go nowhere. Ammar has done his best to help you, you however have
expected everything to be given to you on a plate. Behave like a child
and you will be treated like a child.

TTFN

Aggression won't get you anywhere. So, go read the documentation, and
if you have specific questions, we are happy to help. But read the
docs first. That's what it's there for.

···

On Friday, November 12, 2010, jackster the jackle <johnsheahan@att.net> wrote:

What, have you suddenly decided to play the "school master"? I can't
stand people like you who think they are smarter than everyone else and
pretend to help by giving stupid little hints....get lost Ammar

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Am I proceeding down the right path with the authentication?

yes.
just input the command a line at a time... redo back if you get confused...

eg,

require 'net/http'
#=> true
uri = URI.parse("https://www.chase.com/&quot;\)
#=> #<URI::HTTPS:0x83ea2a8 URL:https://www.chase.com/&gt;
http = Net::HTTP.new(uri.host, uri.port)
#=> #<Net::HTTP www.chase.com:443 open=false>
http.use_ssl = true
#=> true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#=> 0
request = Net::HTTP::Get.new(uri.request_uri)
#=> #<Net::HTTP::Get GET>
request.basic_auth("username", "password")
#=> ["Basic dXNlcm5hbWU6cGFzc3dvcmQ="]
response = http.request request
#=> #<Net::HTTPOK 200 OK readbody=true>
response.body
#=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"
\"http://www.w3.org/TR/html4/strict.dtd\&quot;&gt;\\n&lt;html
xmlns:xalan=\"http://xml.apache.org/xalan\&quot;
xmlns:java=\"http://xml.apache.org/xslt/java\&quot; LANG=\"EN\"><head><META
http-equiv=\"Content-Type\" content=\"text/html;
charset=UTF-8\"><title>CHASE Home: Personal Banking | Personal Lending

Retirement &amp; Investing | Business

Banking</title><script>\n\t\t\t\t\t\tvar pageId =
'/online/Home/Chase-Home.dwt';\n\t\t\t\t\t</script><META
name=\"robots\" content=\"INDEX, FOLLOW\"><META name=\"....

<snipped text>....

hth.
kind regards -botp

···

On Fri, Nov 12, 2010 at 10:23 PM, jackster the jackle <johnsheahan@att.net> wrote:

Both approaches above should be equivalent. Question is which authentication method the website uses. It may as well be form fields

Kind regards

  robert

···

On 11/12/2010 03:23 PM, jackster the jackle wrote:

botp wrote in post #960964:

response = http.get('/index.html')
#=> #<Net::HTTPOK 200 OK readbody=true>

response.body

That does help and thank you.
Prior to your post, I was able to get the following working which is
similar to what you gave me:
---------
uri = URI.parse("https://www.chase.com/&quot;\)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
puts response.body
response["header-here"] # All headers are lowercase
----------
I am able to pull the page from Chase but the https url that I really
need requires a one time username and password which I am having trouble
with.

I tried modifying the uri to be:
---------
uri = URI.parse("https://myusername:mypassword@www.myurl.com/&quot;\)
---------

but that didn't work so I tried adding this which I got from Google:
---------
http.basic_auth("username", "password")
---------

Am I proceeding down the right path with the authentication?

(sent via POST for example) or even via a certificate.

botp wrote in post #960976:

require 'net/http'
#=> true
uri = URI.parse("https://www.chase.com/&quot;\)
#=> #<URI::HTTPS:0x83ea2a8 URL:https://www.chase.com/&gt;
http = Net::HTTP.new(uri.host, uri.port)
#=> #<Net::HTTP www.chase.com:443 open=false>
http.use_ssl = true
#=> true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#=> 0

That is a good way of doing it, a step at a time through irb...I don't
usually do it that way but it seems to be a good way to isolate the
failure if any.

When I get down to the line "http.use_ssl = true", it fails with this
error message:

irb(main):004:0> http.use_ssl = true
NoMethodError: undefined method `use_ssl=' for #<Net::HTTP myurl.com:443
open=false>
        from (irb):4
irb(main):005:0>

···

--
Posted via http://www.ruby-forum.com/\.

Hi Botp,

I got it working thanks to you. I had to start off with require
'net/https' instead of require 'net/http' and everything worked.

I can't thank you enough for working with me and helping me learn
something.

take care

jack

jackster the jackle wrote in post #960988:

···

botp wrote in post #960976:

require 'net/http'
#=> true
uri = URI.parse("https://www.chase.com/&quot;\)
#=> #<URI::HTTPS:0x83ea2a8 URL:https://www.chase.com/&gt;
http = Net::HTTP.new(uri.host, uri.port)
#=> #<Net::HTTP www.chase.com:443 open=false>
http.use_ssl = true
#=> true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#=> 0

That is a good way of doing it, a step at a time through irb...I don't
usually do it that way but it seems to be a good way to isolate the
failure if any.

When I get down to the line "http.use_ssl = true", it fails with this
error message:

irb(main):004:0> http.use_ssl = true
NoMethodError: undefined method `use_ssl=' for #<Net::HTTP myurl.com:443
open=false>
        from (irb):4
irb(main):005:0>

--
Posted via http://www.ruby-forum.com/\.