XML to CSV with REXML - I'm sure this should be easy

Ok, I'm taking a fairly simple xml file containing a series of events
and I want to convert it to csv - nothing new there.

However, some events have two or more dates listed and I'd like to
display each as individual lines. My ruby skills are fairly limited but
from googling around I can extract everything up to the dates, but I'm
banging my head against a wall to get any further...

Here's the XML:

<event id='1234'>

  <title>Event Title</title>

  <category>Event Category</category>

  <venue>

    <name>Venue Name</name>

    <address>

      <address1>1 Some Street</address1>

      <town>Some Town</town>

    </address>

  </venue>

  <performances>

    <performance date='2009-04-01 18:00:00' />

    <performance date='2009-04-03 18:00:00' />

  </performances>

</event>

This is my extraction code:

require 'rexml/document'
xml = REXML::Document.new(File.open("data.xml"))
csv_file = File.new("data.csv", "w")
xml.elements.each("//event") do |e|
    csv_file.puts e.attributes['id'] << "|" <<
            e.elements['title'].text << "|" <<
      e.elements['category'].text << "|" <<
      e.elements['venue/name'].text << "|" <<
      e.elements['venue/address/address1'].text << "|" <<
      e.elements['venue/address/town'].text
end

Which gives me:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some Town

But what I really want is:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-01 18:00:00
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-03 18:00:00

I'm sure this should be fairly simple but any help would be appreciated.
Cheers!

···

--
Posted via http://www.ruby-forum.com/.

Ok, I'm taking a fairly simple xml file containing a series of events
and I want to convert it to csv - nothing new there.

However, some events have two or more dates listed and I'd like to
display each as individual lines. My ruby skills are fairly limited but
from googling around I can extract everything up to the dates, but I'm
banging my head against a wall to get any further...

Here's the XML:

<event id='1234'>

  <title>Event Title</title>

  <category>Event Category</category>

  <venue>

    <name>Venue Name</name>

    <address>

      <address1>1 Some Street</address1>

      <town>Some Town</town>

    </address>

  </venue>

  <performances>

    <performance date='2009-04-01 18:00:00' />

    <performance date='2009-04-03 18:00:00' />

  </performances>

</event>

This is my extraction code:

require 'rexml/document'
xml = REXML::Document.new(File.open("data.xml"))
csv_file = File.new("data.csv", "w")
xml.elements.each("//event") do |e|
    csv_file.puts e.attributes['id'] << "|" <<
            e.elements['title'].text << "|" <<
      e.elements['category'].text << "|" <<
      e.elements['venue/name'].text << "|" <<
      e.elements['venue/address/address1'].text << "|" <<
      e.elements['venue/address/town'].text

Here you need to iterate through all the "performance" elements _below the current event_ and concatenate the individual performance's date with what you have built so far.

You should probably also take measures to emit a line without a date in case zero performances can be found in input XML.

end

Which gives me:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some Town

But what I really want is:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-01 18:00:00
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-03 18:00:00

I'm sure this should be fairly simple but any help would be appreciated.
Cheers!

Kind regards

  robert

···

On 17.03.2009 13:03, Sandy Thomson wrote:

--
remember.guy do |as, often| as.you_can - without end

What you are really interested in is each performance (each performance
generates one line of output). So simply deepen your loop:

xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|

Now do exactly what you're doing and just append the performance date.
So, for example:

require 'rexml/document'
include REXML
output = ""
class REXML::Element
  def textof(xpaths_arr); xpaths_arr.map {|x| elements.text}; end
end
xml = Document.new(s)
xp = %w{title category venue/name
  venue/address/address1 venue/address/town}
xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|
    output <<
      [e.attributes['id'],
      e.textof(xp),
      p.attributes['date']].flatten.join("|") + "\n"
  end
end

m.

···

Sandy Thomson <rhubarbcrumble@gmail.com> wrote:

Ok, I'm taking a fairly simple xml file containing a series of events
and I want to convert it to csv - nothing new there.

However, some events have two or more dates listed and I'd like to
display each as individual lines. My ruby skills are fairly limited but
from googling around I can extract everything up to the dates, but I'm
banging my head against a wall to get any further...

Here's the XML:

<event id='1234'>

  <title>Event Title</title>

  <category>Event Category</category>

  <venue>

    <name>Venue Name</name>

    <address>

      <address1>1 Some Street</address1>

      <town>Some Town</town>

    </address>

  </venue>

  <performances>

    <performance date='2009-04-01 18:00:00' />

    <performance date='2009-04-03 18:00:00' />

  </performances>

</event>

This is my extraction code:

require 'rexml/document'
xml = REXML::Document.new(File.open("data.xml"))
csv_file = File.new("data.csv", "w")
xml.elements.each("//event") do |e|
    csv_file.puts e.attributes['id'] << "|" <<
            e.elements['title'].text << "|" <<
      e.elements['category'].text << "|" <<
      e.elements['venue/name'].text << "|" <<
      e.elements['venue/address/address1'].text << "|" <<
      e.elements['venue/address/town'].text
end

Which gives me:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some Town

But what I really want is:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-01 18:00:00
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-03 18:00:00

--
matt neuburg, phd = matt@tidbits.com, Matt Neuburg’s Home Page
Leopard - http://www.takecontrolbooks.com/leopard-customizing.html
AppleScript - http://www.amazon.com/gp/product/0596102119
Read TidBITS! It's free and smart. http://www.tidbits.com

Robert Klemme wrote:

Here you need to iterate through all the "performance" elements _below
the current event_ and concatenate the individual performance's date
with what you have built so far.

You should probably also take measures to emit a line without a date in
case zero performances can be found in input XML.

Kind regards

  robert

Thank you Robert, I'm getting closer...

I modified it as below, but for some reason the dates are now stacking
up on each other as so:
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-01 18:00:00
1234|Event Title|Event Category|Venue Name|1 Some Street|Some
Town>2009-04-01 18:00:00|2009-04-03 18:00:00

So, I'm still missing something - any ideas?

xml.elements.each("//event") do |e|
  detail =
  (
    e.attributes['id'] << "|" <<
    e.elements['title'].text << "|" <<
    e.elements['category'].text << "|" <<
    e.elements['venue/name'].text << "|" <<
    e.elements['venue/address/address1'].text << "|" <<
    e.elements['venue/address/town'].text
  )

  xml.elements.each("//performances/performance") do |f|
    csv_file.puts detail << "|" << f.attributes['date']
  end
end

···

--
Posted via http://www.ruby-forum.com/\.

What you are really interested in is each performance (each performance
generates one line of output). So simply deepen your loop:

xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|

Now do exactly what you're doing and just append the performance date.
So, for example:

require 'rexml/document'
include REXML
output = ""
class REXML::Element
  def textof(xpaths_arr); xpaths_arr.map {|x| elements.text}; end
end
xml = Document.new(s)
xp = %w{title category venue/name
  venue/address/address1 venue/address/town}
xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|
    output <<
      [e.attributes['id'],
      e.textof(xp),
      p.attributes['date']].flatten.join("|") + "\n"
  end
end

m.

Ok, so this is plainly much neater, thanks Matt :slight_smile:

...but, whilst it works for one event with multiple dates, as soon as I
add a second event it iterates through all of the dates against every
event, so for 2 events each with 2 dates it outputs 8 lines...

Here's the code as it now stands:

require 'rexml/document'
include REXML
output = ""
class REXML::Element
  def textof(xpaths_arr); xpaths_arr.map {|x| elements.text}; end
end
xml = REXML::Document.new(File.open("data.xml"))
csv_file = File.new("data.csv", "w")
xp = %w{title category venue/name venue/address/address1
venue/address/town}

xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|

  csv_file.puts output + [e.attributes['id'], e.textof(xp),
p.attributes['date']].flatten.join("|") + "\n"

  end
end

···

--
Posted via http://www.ruby-forum.com/\.

  xml.elements.each("//performances/performance") do |f|

That should read e.elements.each.... but the result is the same

- cheers for the reply matt, just looking at that now

···

--
Posted via http://www.ruby-forum.com/\.

..but, whilst it works for one event with multiple dates, as soon as I
add a second event it iterates through all of the dates against every
event, so for 2 events each with 2 dates it outputs 8 lines...

Cool! :slight_smile:

xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|

Yeah, sorry about that. I wasn't thinking about the XPath here.
Obviously "//performance" is wrong. I shoulda said
"descendant::performance" or "performances/performance" or similar.

Of course one could also argue that Ruby and REXML are more heavyweight
than you need; you're just dumpster-diving in simple XML and outputting
text, so you could write this whole thing as an XSLT template. Choices,
choices...!

m.

···

Sandy Thomson <rhubarbcrumble@gmail.com> wrote:

--
matt neuburg, phd = matt@tidbits.com, Matt Neuburg’s Home Page
Leopard - http://www.takecontrolbooks.com/leopard-customizing.html
AppleScript - http://www.amazon.com/gp/product/0596102119
Read TidBITS! It's free and smart. http://www.tidbits.com

matt neuburg wrote:

···

Sandy Thomson <rhubarbcrumble@gmail.com> wrote:

..but, whilst it works for one event with multiple dates, as soon as I
add a second event it iterates through all of the dates against every
event, so for 2 events each with 2 dates it outputs 8 lines...

Cool! :slight_smile:

xml.elements.each("//event") do |e|
  e.elements.each("//performance") do |p|

Yeah, sorry about that. I wasn't thinking about the XPath here.
Obviously "//performance" is wrong. I shoulda said
"descendant::performance" or "performances/performance" or similar.

Of course one could also argue that Ruby and REXML are more heavyweight
than you need; you're just dumpster-diving in simple XML and outputting
text, so you could write this whole thing as an XSLT template. Choices,
choices...!

m.

Choices indeed, but it works perfectly now so I'll go with it :slight_smile:

Thanks!
--
Posted via http://www.ruby-forum.com/\.