Rexml - StreamListener - Where I am in the XML?

Hi,

I'm using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:

--- code start here ---

require 'rexml/document'
require 'rexml/streamlistener'

class MyListener
  include REXML::StreamListener
  def tag_start(name, attrs)
    # anything to do ...
  end
  def text(text)
    # anything to do ...
  end
end

REXML::Document.parse_stream( File.open( xmlfile), MyListener.new)

--- code ends here ---

At the "tag_start" method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

Regards

Michael

In general, the big value of a stream parser is that it is not holding onto much state, so memory needs stay small regardless of the size of the XML. State tracking is left to the application developer.

The REXML pull-parser lets you peek at the next event; not sure offhand if it goes the other way. But I suspect that with the stream and pull parsers (one of which sits on the other under the hood, so they are more or less the same), once an event is off the stack, it is gone.

Stream parsing works really well when you have a large source of regularly structured data (e.g., XML dump of a database table), such that you can grab and stash in memory just what you need, work with it (perhaps as a transient DOM), then discard it and move on.

···

beikel.meikel@web.de wrote:

Hi,

I'm using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:

--- code start here ---

require 'rexml/document'
require 'rexml/streamlistener'

class MyListener
  include REXML::StreamListener
  def tag_start(name, attrs)
    # anything to do ...
  end
  def text(text)
    # anything to do ...
  end
end

REXML::Document.parse_stream( File.open( xmlfile), MyListener.new)

--- code ends here ---

At the "tag_start" method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

--
James Britt

"Trying to port the desktop metaphor to the Web is like working
on how to fuel your car with hay because that is what horses eat."
    - Dare Obasanjo

Hi,

I've used REXML::Parsers::PullParser instead of stream parsing (same
general idea), here's an example of a function that waits until it
sees a tag that matches element_name and then pulls the text from it:

    def self.get_element_text(filename, element_name)
      parser = REXML::Parsers::PullParser.new(File.new(filename))
      text = false
      while parser.has_next?
        el = parser.pull
        if el.start_element? and el[0] == element_name
          text = parser.peek[0]
          break
        end
      end
      return text
    end

So, while the above certainly won't work for your application, you
could try playing a little with parser.peek to see if you can find the
child node (or next node, whatever) that you're looking for.

HTH,
Keith

···

On 2/21/07, beikel.meikel@web.de <beikel.meikel@web.de> wrote:

I'm using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:
...
At the "tag_start" method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

@James: I know a SAX parser in another language that has to use like
the REXML::StreamListener. You have additional the information that -
for example - a FirstName-Tag is a member of a User-Tag and so on.
You're right, if I use REXML::StreamListener, I can track the stack
for my self. But there was a chance that I have oversight the right
function in REXML::StreamListener only :slight_smile:

@Keith: I will check the REXML::Parsers::PullParser. Thanks for the
info :slight_smile:

Michael