REXML advice - output

Hey all,

I would like to pick your brains about Rexml and how to report from it.
For example, I am reading an XML file using references to each XML tag
like so:

doc.root.each_element("/UserData/List/ItemInfo/Title") {|e|
  report.puts "Title: #{e.text}"
}
doc.root.each_element("/UserData/List/ItemInfo/Date") {|e|
  report.puts "Date: #{e.text}"
}

The 'report.puts' writes this data out to a CSV file. At present I get a
list of all the titles in the XML file followed a list of the dates.
What I need it to get the side by side in a CSV file like so

Title Date
Item1 20th Jan 2009
Item2 12th Feb 2010

Does anyone have any suggestions on a suitable workflow for this?

Many thanks

···

--
Posted via http://www.ruby-forum.com/.

Just iterate over all "ItemInfo" elements and print values from sub
elements (which you can select via a relative XPath).

Kind regards

robert

···

On Thu, Sep 16, 2010 at 4:42 PM, Stuart Clarke <stuart.clarke1986@gmail.com> wrote:

Hey all,

I would like to pick your brains about Rexml and how to report from it.
For example, I am reading an XML file using references to each XML tag
like so:

doc.root.each_element("/UserData/List/ItemInfo/Title") {|e|
report.puts "Title: #{e.text}"
}
doc.root.each_element("/UserData/List/ItemInfo/Date") {|e|
report.puts "Date: #{e.text}"
}

The 'report.puts' writes this data out to a CSV file. At present I get a
list of all the titles in the XML file followed a list of the dates.
What I need it to get the side by side in a CSV file like so

Title Date
Item1 20th Jan 2009
Item2 12th Feb 2010

Does anyone have any suggestions on a suitable workflow for this?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Could anybody help me with an issue you I am having with some XML I am
reading. I am using xpath to read 2 different parts of an XML file,
which looks a lot like this

<Data>
<DoneList><Vector><Count>84</Count>
<FullItemInfo>
<Count>0</Count>
<ItemInfo>
<Title>BLAH LAH</Title>
<Id>12345</Id>
</ItemInfo>
</Vector></DoneList>
<FullItemInfo>
NEXT ITEM AS BOVE

Then I have further data, which is slightly different
<NotDoneList><Vector><Count>84</Count>
<FullItemInfo>
<Count>0</Count>
<ItemInfo>
<Title>BLAH LAH</Title>
<Id>12345</Id>
</ItemInfo>
</Vector></DoneList>
<FullItemInfo>
</Data>

As you can see, the tags are the same but the first is DoneList and the
second NotDoneList. I need to process each set seperately and each set
can contain more than 1 entry. My code to give a CSV file is

doc = REXML::Document.new(d) #call REXML to open the XML file
#To get NotDoneList data
doc.elements.each("//NotDoneList/Vector/Count/FullItemInfo") do |e|
  detail =
        (
  e.elements['ItemInfo/Title'].text << "," <<
        e.elements['ItemInfo/Id'].text
        )
        puts detail
end

#To get DoneList data
doc.elements.each("//DoneList/Vector/Count/FullItemInfo") do |e|
  detail =
        (
  e.elements['ItemInfo/Title'].text << "," <<
        e.elements['ItemInfo/Id'].text
        )
        puts detail
end

When I run this, no data in extracted and no errors are given. In
contrast if I do
doc.elements.each("//FullItemInfo") do |e|
I am able to extract all the information for both the NotDoneList and
DoneList, however this is not what I want. I want to address each data
set separately. The eventual idea will be to produce a report of all
items in the NotDoneList and another report for those in the DoneList.
I guess I am doing something wrong but I cannot see it.

Can anyone see what I am doing wrong with this? I would really
appreciate any help as I cannot figure it out.

Many thanks

···

--
Posted via http://www.ruby-forum.com/.

My issue was due to the mis matched tags actually, it was a broken XML
file.

Thanks for identifying that.

···

--
Posted via http://www.ruby-forum.com/.

Robert Klemme wrote:

···

On Thu, Sep 16, 2010 at 4:42 PM, Stuart Clarke > <stuart.clarke1986@gmail.com> wrote:

�report.puts "Date: #{e.text}"
Does anyone have any suggestions on a suitable workflow for this?

Just iterate over all "ItemInfo" elements and print values from sub
elements (which you can select via a relative XPath).

Kind regards

robert

Thanks for getting back to me. I will look into this and see how I get
on.

Thanks a lot Robert.
--
Posted via http://www.ruby-forum.com/\.

Robert Klemme wrote:

�report.puts "Date: #{e.text}"
Does anyone have any suggestions on a suitable workflow for this?

Just iterate over all "ItemInfo" elements and print values from sub
elements (which you can select via a relative XPath).

Kind regards

robert

To confirm I am following you correctly, I have now got the following:

info = doc.elements.to_a("//UserData/List/ItemInfo/")

Printing out info gives a line per line entry of all children under the
tag ItemInfo.

First of all, is this what you meant? Am I correct to assume that at
this point, you are suggesting I write this data to a CSV file stripping
off the tags with a regex or something? Is this correct?

Many thanks and apologies if I have misunderstood.

···

On Thu, Sep 16, 2010 at 4:42 PM, Stuart Clarke > <stuart.clarke1986@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

<Data>
<DoneList><Vector><Count>84</Count>
<FullItemInfo>
<Count>0</Count>
<ItemInfo>
<Title>BLAH LAH</Title>
<Id>12345</Id>
</ItemInfo>
</Vector></DoneList>
<FullItemInfo>
NEXT ITEM AS BOVE

Data
    DoneList
        Vector
            Count /Count
            FullItemInfo
                Count /Count
                ItemInfo
                    Title /Title
                    Id /Id
                /ItemInfo
            /Vector
        /DoneList
        FullItemInfo

The XML example you provided seems to have mismatched tags?

Then I have further data, which is slightly different
<NotDoneList><Vector><Count>84</Count>
<FullItemInfo>
<Count>0</Count>
<ItemInfo>
<Title>BLAH LAH</Title>
<Id>12345</Id>
</ItemInfo>
</Vector></DoneList>
<FullItemInfo>
</Data>

NotDoneList
    Vector
        Count /Count
        FullItemInfo
            Count /Count
            ItemInfo
                Title /Title
                Id /Id
            /ItemInfo
        /Vector
    /DoneList
    FullItemInfo
    /Data

doc.elements.each("//NotDoneList/Vector/Count/FullItemInfo")
doc.elements.each("//DoneList/Vector/Count/FullItemInfo") do |e|

Can you verify and re-post a clean XML snippet? (That may help debug
your XPath.) I'm going to guess:

    <Data>
        <DoneList>
            <Vector>
                <Count/>
                <FullItemInfo/>
            </Vector>
        </DoneList>
    </Data>

In which case, the XPath might be: '//DoneList/Vector/FullItemInfo'?

I managed to mess-up clicking "Send" on Friday, so I'm trying again (-:

···

Stuart Clarke wrote:

Robert Klemme wrote:

Stuart Clarke wrote:

report.puts "Date: #{e.text}"
Does anyone have any suggestions on a suitable workflow for this?

Just iterate over all "ItemInfo" elements and print values from sub
elements (which you can select via a relative XPath).

Thanks for getting back to me. I will look into this and see how I get
on.

I pulled this out of a script I use quite a bit and hacked your XPath into it:

ARGV.each do |filename|
doc = REXML::Document.new( File.new( filename ) )

doc.elements.each("/UserData/List/ItemInfo"){|e|
print e.elements["Title"].text, "\t"
puts e.elements["Date"].text
end
end