Parsing through XML with REXML/XPath

Hi,
I need to sort groups of xml data based on the first instances of
particular elements down deep in the element structure of documents.
  regs = []
  regs = XPath.match(doc, "//registration")
  regs.each do |reg|
   codes = XPath.match(regs, "//issue[1]/")# { |element| puts
element.text }
   puts codes
  end
I'm getting:
  <issue code='ENG'>Energy/Nuclear</issue>
  <issue code='EDU'>Education</issue>
  <issue code='AGR'>Agriculture</issue>
  ...

There are only 86 entries of <issue> in the document, but, I'm getting
over 2,550 results here for "puts codes!" Obviously, it's looping and I
don't know why. It is pulling just the first entries, which I want, but,
obiously, it's doing it lots and lots of times.

I'm also trying to parse out these results, so that, I only end up with
the actual element text, not any attributes. So, for example, in the
above results, I only want "Energy/Nuclear, Education, and Agriculture,"
not any of the surrounding stuff. So, I've tried this, inside the above:
  codes.each do |code|
  code.to_s.gsub!(/<issue code='[A-Z]{3}'>(.*?)\/*.*<\/issue>/, "$1")
  puts code
  end

Thanks,
Peter

···

--
Posted via http://www.ruby-forum.com/.

Hi,
I need to sort groups of xml data based on the first instances of
particular elements down deep in the element structure of documents.
  regs =
  regs = XPath.match(doc, "//registration")
  regs.each do |reg|
   codes = XPath.match(regs, "//issue[1]/")# { |element| puts
element.text }
   puts codes
  end
I'm getting:
  <issue code='ENG'>Energy/Nuclear</issue>
  <issue code='EDU'>Education</issue>
  <issue code='AGR'>Agriculture</issue>
  ...

There are only 86 entries of <issue> in the document, but, I'm getting
over 2,550 results here for "puts codes!" Obviously, it's looping and I
don't know why. It is pulling just the first entries, which I want, but,
obiously, it's doing it lots and lots of times.

I not sure what you mean by "sort groups of xml data based on the first instance of particular elements". Can you explain that more? Feel free to email me off-list if you'd like since this isn't really a Ruby question.

I'm also trying to parse out these results, so that, I only end up with
the actual element text, not any attributes.

To ask for just the text inside an element, you need to use "text()".
For example, this would give you a node set containing all the text in the issue elements.

/registration//issue/text()

Note that you should only use one slash at the beginning to specify that the root element should be "registration".

···

On Sep 25, 2007, at 9:34 AM, Peter Bailey wrote:

So, for example, in the
above results, I only want "Energy/Nuclear, Education, and Agriculture,"
not any of the surrounding stuff. So, I've tried this, inside the above:
  codes.each do |code|
  code.to_s.gsub!(/<issue code='[A-Z]{3}'>(.*?)\/*.*<\/issue>/, "$1")
  puts code
  end

Thanks,
Peter
--
Posted via http://www.ruby-forum.com/\.

---
Mark Volkmann

Thanks, Mark. Yes, I'm sorry about the lingo. I'm not that versed in
XML-speak yet. But, basically, I want to sort all of the <registration>
data sets in my files, just alphabetically. But, as I said, I need to
put in some headings into my output, and, those headings are essentially
the text that's in the very first instance of the <issue> element. The
<issue> element is a child element to <registration>. Now, your
suggestion with the use of /text() worked, meaning, I got just that text
that I want. That's great. But, it's still looping and giving me many
repeats of these instances.

Here's my e-mail address, if you'd like to strike up a separate
conversation.

Thanks,
Peter
pbailey@bna.com

···

--
Posted via http://www.ruby-forum.com/.