Parse XML that isn't well formed

Cliveharber · 19 September 2007 10:30

Hi

looking at the xml that you have presented, there is no root element to the document - this is why the xml is being rejected - you can only have a single root element in the document - you could try something like this:

<?xml version="1.0" encoding="UTF-8"?>
<server>
         <server_url>http://myserver.edu/data/</server\_url>
         <server_name>myserver.edu</server_name>
         <uploads>
                  <result>
                         <dir>/storage/data/results/</dir>
                         <result_name>hadcm3l_00012_00000118_0</result_name>
                         <file_info>
                         <name>hadcm3l_00012_00000118_0_6.zip</name>
                         <nbytes>5154055</nbytes> <md5_checksum>485600296bb601ab4a3d1d49a9fb1c86</md5_checksum>
                         </file_info>
                         <file_info>
                         <name>hadcm3l_00012_00000118_0_7.zip</name>
                         <nbytes>5153055</nbytes>
          <md5_checksum>36a600296cb60229a3d1d49a9fb1a10</md5_checksum>
                         </file_info>
                 </result>
         </uploads>
</server>

If all the files that you are trying are similar to the set up that you have demonstrated then this could be the root cause. When fixed you should then be able to use whichever xml parser you want.

HTH

Clive
---- Milo Thurston <knirirr@gmail.com> wrote:

···

I have some XML looking like the following, other than being very much
larger (some files are up to 2GB):

<?xml version="1.0" encoding="UTF-8"?>
        <server_url>http://myserver.edu/data/</server\_url>
        <server_name>myserver.edu</server_name>
        <uploads>
                <result>
                        <dir>/storage/data/results/</dir>
                        <result_name>hadcm3l_00012_00000118_0</result_name>
                        <file_info>
                        <name>hadcm3l_00012_00000118_0_6.zip</name>
                        <nbytes>5154055</nbytes>
                        <md5_checksum>485600296bb601ab4a3d1d49a9fb1c86</md5_checksum>
                        </file_info>
                        <file_info>
                        <name>hadcm3l_00012_00000118_0_7.zip</name>
                        <nbytes>5153055</nbytes>
                        <md5_checksum>36a600296cb60229a3d1d49a9fb1a10</md5_checksum>
                        </file_info>
                </result>
        </uploads>
</xml>

I've tried a few xml parsers such as xml-simple, libxml and quixml, but
all reject this data as badly formed. One answer would, of course, be
for the data to be re-generated using properly formed xml. Meanwhile, is
there anything that could be done with the existing files? Is it a case
of having to write regexps to parse this sort of thing?
--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Parse XML that isn't well formed ruby-talk	4	172	21 September 2007
Basic xml parsing question ruby-talk	3	88	27 March 2009
REXML usage ruby-talk	3	74	15 December 2006
XML parsing, ISO8859-1 & UTF-8 ruby-talk	1	155	19 September 2013
Libxml: handling parse errors ruby-talk	1	73	17 March 2006

Parse XML that isn't well formed

Related topics