Rexml exceptions

Is there any way to get any useful data out of REXML::ParseException when you're working on a String? It never sets @position or @line or anything. I need to figure out exactly where the error-causing tag starts, and save it.

Any ideas?

-- rakaur

Out of curiosity, I tried libxml. It has nice error messages:

foo.xml:3:
parser error :
Opening and ending tag mismatch: title line 3 and txitle
        <title>Foo</txitle>
                           ^

but they go to stdout. You can capture them by registering an error
handler. Sample code:

parser = XML::Parser.new
parser.filename = "foo.xml"

msgs =
XML::Parser.register_error_handler lambda { |msg| msgs << msg }

begin
  parser.parse
rescue Exception => e
  puts "Error: #{msgs}"
end

-- Mark.

···

On Sep 29, 10:56 pm, Eric Will <rak...@malkier.net> wrote:

Is there any way to get any useful data out of REXML::ParseException
when you're working on a String? It never sets @position or @line or
anything. I need to figure out exactly where the error-causing tag
starts, and save it.

Any ideas?

Out of curiosity, I tried libxml. It has nice error messages:

foo.xml:3:
parser error :
Opening and ending tag mismatch: title line 3 and txitle
       <title>Foo</txitle>
                          ^

but they go to stdout. You can capture them by registering an error
handler. Sample code:

parser = XML::Parser.new
parser.filename = "foo.xml"

msgs =
XML::Parser.register_error_handler lambda { |msg| msgs << msg }

begin
parser.parse
rescue Exception => e
puts "Error: #{msgs}"
end

Interesting. I was thinking about doing libxml anyway. I do not like REXML.

Thanks.

-- Mark.

-- rakaur

···

On Tue, Sep 30, 2008 at 9:59 AM, Mark Thomas <mark@thomaszone.com> wrote:

Actually, this isn't working for me. I'm using the SAX parser, and it
just calls Listener#on_parser_error with a string. Not helping me.

Why would you want to do that? You already have the XML as a string.
The only reason to put up with the awful interface and extra
complexity of SAX would be if your file doesn't fit into memory. And I
don't think the SAX interface to libxml is as complete/robust yet as
the DOM interface.

Go with the DOM interface. With libxml it's plenty fast.

-- Mark.

···

On Sep 30, 12:46 pm, Eric Will <rak...@malkier.net> wrote:

Actually, this isn't working for me. I'm using the SAX parser, and it
just calls Listener#on_parser_error with a string. Not helping me.

My situation requires SAX, unfortunately.

I need to parse and react to each tag as in comes in. If there's a
broken one, all tags up to the broken one must be processed, and the
broken one must be stored. I cannot do this in DOM, because if there's
an error, DOM will not process anything.

Also, I don't think those error messages can help me location the
position in the string of the bad XML. They're pretty, for sure, but
not very useful to anyone but a human.

If you can receive an entire document at a time, libxml has a
'recover' mode that will correct what it can and process the entire
document -- even if it is not well-formed. It works surprisingly well.

Another option is writing your own recursive descent parser. See
http://snippets.dzone.com/posts/show/2190 for a starting point.

-- Mark.

···

On Sep 30, 2:09 pm, Eric Will <rak...@malkier.net> wrote:

My situation requires SAX, unfortunately.

I need to parse and react to each tag as in comes in. If there's a
broken one, all tags up to the broken one must be processed, and the
broken one must be stored. I cannot do this in DOM, because if there's
an error, DOM will not process anything.