Process xml in a page

Hi all,

I would like to process the xml in the url below, but I am unsure how.
I've always had APIs to help me out before, so this is new.

http://clinicaltrials.gov/show/NCT00001372?displayxml=true

I want to get this xml in an object that I can then parse and extract
the portions I need.

Thank you for your help, as always.

···

--
Posted via http://www.ruby-forum.com/.

Hunter Walker wrote:

Hi all,

I would like to process the xml in the url below, but I am unsure how. I've always had APIs to help me out before, so this is new.

http://clinicaltrials.gov/show/NCT00001372?displayxml=true

I want to get this xml in an object that I can then parse and extract the portions I need.

Thank you for your help, as always.

Use open-uri to fetch the XML, then load it into a REXML::Document object.

Then use REXML's XPath to grab what you need.

http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/
http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/index.html

···

--
James Britt

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools

Thank you, James. I got a step further, but still need help.

This code outputs exactly what I want to see to the debug window:

require 'open-uri'
require "rexml/document"
include REXML

url = "http://clinicaltrials.gov/show/NCT00001372?displayxml=true"

open(url) { |page| print page.read() }

However, I am having trouble when using the REXML tutorial. Should I
somehow save the output to an .xml file locally? Or should I add the
output to a string? I am not sure how to do either as I am having
trouble directing the "print" output anywhere.

Thank you and sorry for the noobness. The api I used in a ruby project
before made everything a bit too easy, so I am still learning.

-Hunter

James Britt wrote:

···

Hunter Walker wrote:

Thank you for your help, as always.

Use open-uri to fetch the XML, then load it into a REXML::Document
object.

Then use REXML's XPath to grab what you need.

http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/
http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/index.html

--
James Britt

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - The Journal By & For Rubyists
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools

--
Posted via http://www.ruby-forum.com/\.

Hunter Walker wrote:

Thank you, James. I got a step further, but still need help.

This code outputs exactly what I want to see to the debug window:

require 'open-uri'
require "rexml/document"
include REXML

url = "http://clinicaltrials.gov/show/NCT00001372?displayxml=true"

open(url) { |page| print page.read() }

require 'open-uri'
require "rexml/document"
include REXML

url = "http://clinicaltrials.gov/show/NCT00001372?displayxml=true"

xml = open(url).read
p xml

doc = REXML::Document.new(xml)

p doc.root.name

···

--
James Britt

"I never dispute another person's delusions, just their facts."
   - Len Bullard

I get the error below after a successful XML output to the debug window.
Thanks!

No such file to load -- rexml/encodings/ASCII.rb
No decoder found for encoding ASCII. Please install iconv.
c:/ruby/lib/ruby/1.8/rexml/encoding.rb:33:in `encoding='
c:/ruby/lib/ruby/1.8/rexml/source.rb:40:in `encoding='
c:/ruby/lib/ruby/1.8/rexml/parsers/baseparser.rb:202:in `pull'
c:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:21:in `parse'
c:/ruby/lib/ruby/1.8/rexml/document.rb:176:in `build'
c:/ruby/lib/ruby/1.8/rexml/document.rb:45:in `initialize'
C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13:in `new'
C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13

c:/ruby/lib/ruby/1.8/rexml/encoding.rb:33:in `encoding=': No decoder
found for encoding ASCII. Please install iconv. (Exception)
  from c:/ruby/lib/ruby/1.8/rexml/source.rb:40:in `encoding='
  from c:/ruby/lib/ruby/1.8/rexml/parsers/baseparser.rb:202:in `pull'
  from c:/ruby/lib/ruby/1.8/rexml/parsers/treeparser.rb:21:in `parse'
  from c:/ruby/lib/ruby/1.8/rexml/document.rb:176:in `build'
  from c:/ruby/lib/ruby/1.8/rexml/document.rb:45:in `initialize'
  from C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13:in `new'
  from C:/Documents and Settings/hwalker/Desktop/Ruby-1.rb:13

···

--
Posted via http://www.ruby-forum.com/.

A guy in the comments on the page below had the same problem. Someone
supplied a fix for windows (2nd link). It'as all good after that.

http://redhanded.hobix.com/inspect/noXpathOnMessyHtmlIsJustAsEasyInRuby.html

http://www.dave.burt.id.au/ruby/iconv.zip

Thanks again, James!

···

--
Posted via http://www.ruby-forum.com/.