Hi all,
I need to parse a XML file "line by line" because of a application
limitation, so i am trying to build a Stream/Pull xml parser with the
rexml library, but i can't get it to work..
- Anyone knows what can be causing this error? -> Missing end tag for
''
- This error even happens with a simple xml like this one:
psudo_xml = <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<SChange>
</SChange>
EOF
Error:
···
----------
DBG: event_type: text
TXT Normal
DBG: event_type: end_element
END Mode
/opt/local/lib/ruby/1.8/rexml/parsers/baseparser.rb:330:in `pull':
Missing end tag for '' (got "SChange") (REXML::ParseException)
Line:
Position:
Last 80 unconsumed characters:
from /opt/local/lib/ruby/1.8/rexml/parsers/pullparser.rb:68:in `pull'
from text2.rb:13:in `parse'
from text2.rb:32:in `line_process'
from text2.rb:47
Ruby code
----------
require "stringio"
require 'rexml/parsers/pullparser'
class BaseParser
def initialize
@parser = nil
end
def parse(raw_xml)
@parser = REXML::Parsers::PullParser.new(raw_xml)
while @parser.has_next?
pull_event = @parser.pull
puts "DBG: event_type: #{pull_event.event_type}"
if pull_event.error?
puts "\tERR #{pull_event[0]} - #{pull_event[0]}"
elsif pull_event.start_element?
puts "\tSTART #{pull_event[0]}"
elsif pull_event.end_element?
puts "\tEND #{pull_event[0]}"
elsif pull_event.text?
puts "\tTXT #{pull_event[0]}"
end
end
end
end
def line_process(ios,myparser)
while (line = ios.gets)
line.chomp!
myparser.parse(line)
end
end
psudo_xml = <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<SChange>
<Service>Testing:service</Service>
<Status>Critical</Status>
<Mode>Normal</Mode>
</SChange>
EOF
psudo_xml_io = StringIO.new(psudo_xml)
line_process(psudo_xml_io,BaseParser.new)
----------
Thanks for any help in advance.
--
Posted via http://www.ruby-forum.com/.
You instantiate a *new* pull parser for *each* line, so the state is obviously lost after each line and when you feed the last parser with </SChange> it naturally complains because it doesn't know what you're talking about 
···
On 9 déc. 08, at 21:17, Sebastian (syepes) wrote:
@parser = REXML::Parsers::PullParser.new(raw_xml)
--
Luc Heinrich - luc@honk-honk.com
Why do you want to do that exactly? If you don't have the whole XML file at once and only have an IO like object, you can directly pass this object to the pull parser which should simply block until enough data is available to produce each events.
···
On 10 déc. 08, at 14:35, Sebastian (syepes) wrote:
Mmm, so is there a way to "parse* each line of the XML independently
--
Luc Heinrich - luc@honk-honk.com
Bob Hutchison wrote:
Hi,
Mmm, so is there a way to "parse* each line of the XML independently,
and is it posible with the PullParser library?
Having written a pull parser, I'd have to say: No.
The parser is going to be looking for 'events', and it is going to
want to deal with well-formedness issues if it is an actual xml parser.
What are you trying to do, maybe that's a better place to start.
Cheers,
Bob
----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://www.recursive.ca/hutch
Ok, this is the problem i am trying to solve:
I need to parse a XML that comes from the stdout of a unix program, the
program sends a xml* stream when it detects a change and the IO.popen
stays open until the next change.
The real problem is that the function: ex_listener, processes the XML
"line by line" because i can't detect a EOF from the IO.popen and it
will always be waiting (open) for the next change "xml stream".
I have tried using "lines = ios.readlines", but it does not work because
there's no EOF, is there some other way of doing this?
I would appreciate any suggestions on how to solve this problem.
*xml: Sent when a change is detected
···
On 10-Dec-08, at 8:35 AM, Sebastian (syepes) wrote:
---
<SChange>
<Service>Testing:service</Service>
<Status>Critical</Status>
<Mode>Normal</Mode>
</SChange>
Ruby
---------
UNIX_PROG = "/bin/xml_stream"
def ex_connect
ios = IO.popen(UNIX_PROG,"w+")
ios.sync = true
line = ios.gets
if line =~ /xml/
puts "INF: Connected OK (XML)"
ios.puts "<Events>"
return ios
else
puts "ERR: Cannot connect"
exit 1
end
end
def ex_listener(ios)
while (line = ios.gets)
line.chomp!
if line =~ /<\/Events>/
puts "INF: END of program"
exit 0
end
puts "INF: #{line} - #{line.size}"
*parse_line_of_xml(line)*
end
end
ios = ex_connect
ex_listener(ios) # Processes the XML stream
---------
Regards,
--
Posted via http://www.ruby-forum.com/\.
Right, but since you control the parser state you know exactly when and where the document starts and when and where it ends, so you should be able to close the connection by yourself.
···
On 11 déc. 08, at 11:26, Sebastian (syepes) wrote:
The real problem is that the function: ex_listener, processes the XML
"line by line" because i can't detect a EOF from the IO.popen and it
will always be waiting (open) for the next change "xml stream".
--
Luc Heinrich - luc@honk-honk.com
If I understand correctly, you want to keep an IO stream open, and
react to certain elements as they appear? That's a textbook SAX case,
not pull-parsing. Register a SAX handler for your SChange events, and
point your IO stream at it.
I'd use libxml-ruby, but REXML has a stream parser than is SAX-like.
You'd use it something like this (untested)
require "rexml/document"
require "rexml/streamlistener"
include REXML
class Handler
include StreamListener
def tag_start name, attrs
if name=="SChange"
#do something
puts attrs
end
end
end
Document.parse_stream(your_io_stream, Handler.new)
-- Mark.
I'm still not exactly sure of your exact context, but you don't have to detect the EOF, just parse and when you reach the end of the document close the pipe yourself on your end.
···
On 11 déc. 08, at 18:27, Sebastian (syepes) wrote:
Ok i get the point, but i don't see how to detect the EOF (Without using
some ugly code) and pass the hole *xml to the Parser.
--
Luc Heinrich - luc@honk-honk.com
Hi,
Ok i get the point, but i don't see how to detect the EOF (Without using
some ugly code) and pass the hole *xml to the Parser.
I'm still not exactly sure of your exact context, but you don't have to detect the EOF, just parse and when you reach the end of the document close the pipe yourself on your end.
Just for fun, I tried hacking something together using the pull parser that I wrote. This pointed out one possible issue that is confusing, I'll get to that in a second.
How to avoid waiting for an EOF? Count events. Crudely, if you increment the count on a start element event, and decrement on an end element, when the count goes to zero, you've got what you are looking for. This means you are letting the pull parser read the input, you don't do it for the parser.
The issue I mentioned... In my pull parser I'm assuming a file or string input, not an IO stream. I take advantage of that by looking ahead a bit. This isn't a problem unless you are using a stream. In my parser's case, it is looking ahead to at least the end of the next line (huge performance thing with files). The confusing effect is with the stream input:
<SChange>
<Service>Testing:service</Service>
<Status>Critical</Status>
<Mode>Normal</Mode>
</SChange>
<SChange>
<Service>Testing:service</Service>
<Status>Critical</Status>
<Mode>Normal</Mode>
</SChange>
The close of the first SChange element isn't reported until the next line is read, which happens to include the start of the next element. This is a delayed effect that is maybe not the best for a stream input. If you add a blank line between the events the problem goes away (but it'll read the blank line before reporting which shouldn't be a problem).
It is possible that this is affecting your testing.
Cheers,
Bob
···
On 11-Dec-08, at 12:31 PM, Luc Heinrich wrote:
On 11 déc. 08, at 18:27, Sebastian (syepes) wrote:
--
Luc Heinrich - luc@honk-honk.com
----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://www.recursive.ca/hutch