REXml help - Insert newlines into large xml file

Hello, I have a large xml file that does not have any newlines in it. Can someone please provide some code to use REXML to simply read in an xml, insert newlines after the xml sections or elements, then spit it out to stdout. This way I'd at least be able to open the xml file in an editor so I can read what kind of format it has. I don't know anything about REXML so it needs to be a somewhat complete script. thank you.

You could use HTML tidy for this I think.
Mikel

···

On Dec 11, 2007 12:54 PM, Sean Nakasone <seannakasone@yahoo.com> wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

irb(main):001:0> require 'rexml/document'

irb(main):002:0> doc = REXML::Document.new( "<root><child><grandkid/></

</root>" )

irb(main):003:0> doc.write $stdout
<root><child><grandkid/></child></root>=> [<?xml ... ?>, <root> ... </

]

irb(main):004:0> doc.write $stdout, 0
<root>
  <child>
    <grandkid/>
  </child>
</root>

You can use IO.read("somefile.xml") to read the contents into a single
string.
You can pass a file to the REXML::Document#write method instead of
$stdout, e.g.

File.open( "with_newlines.xml", "w" ){ |file|
  doc.write( file, 0 )
}

···

On Dec 10, 6:49 pm, Sean Nakasone <seannakas...@yahoo.com> wrote:

For more information on REXML, see the official tutorial. It covers
this question directly and plainly, as well as a whole host of others.

http://www.germane-software.com/software/rexml/docs/tutorial.html

···

On Dec 10, 6:49 pm, Sean Nakasone <seannakas...@yahoo.com> wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

^ manveru

···

On Dec 11, 2007 10:54 AM, Sean Nakasone <seannakasone@yahoo.com> wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

For the record, would you care to clarify and justify that statement?

···

On Dec 10, 8:35 pm, Michael Fellinger <m.fellin...@gmail.com> wrote:

On Dec 11, 2007 10:54 AM, Sean Nakasone <seannakas...@yahoo.com> wrote:

> Hello, I have a large xml file that does not have any newlines in it. Can
> someone please provide some code to use REXML to simply read in an xml,
> insert newlines after the xml sections or elements, then spit it out to
> stdout. This way I'd at least be able to open the xml file in an editor
> so I can read what kind of format it has. I don't know anything about
> REXML so it needs to be a somewhat complete script. thank you.

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

Michael Fellinger wrote:

···

On Dec 11, 2007 10:54 AM, Sean Nakasone <seannakasone@yahoo.com> wrote:

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

Not at all. If you're using an an older version of the standard library, you prettify the XML using doc.write(output, 0), as in Phrogz' example.

For newer versions of REXML, use the REXML::Formatter class instead which gives you much more control over the prettifier.

Best regards,

Jari Williamsson

Sure, i've tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn't
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.
Unfortunately tidy has a memory-leak, so i cannot recommend the
bindings if your process is running over a longer period. Of course
you could start it in another process, but then the CLI tool is good
enough already.

REXML::VERSION
# "3.1.6"

http://pastie.caboo.se/126905

^ manveru

···

On Dec 11, 2007 1:50 PM, Phrogz <phrogz@mac.com> wrote:

On Dec 10, 8:35 pm, Michael Fellinger <m.fellin...@gmail.com> wrote:

> On Dec 11, 2007 10:54 AM, Sean Nakasone <seannakas...@yahoo.com> wrote:
>
> > Hello, I have a large xml file that does not have any newlines in it. Can
> > someone please provide some code to use REXML to simply read in an xml,
> > insert newlines after the xml sections or elements, then spit it out to
> > stdout. This way I'd at least be able to open the xml file in an editor
> > so I can read what kind of format it has. I don't know anything about
> > REXML so it needs to be a somewhat complete script. thank you.
>
> Step 1) Install tidy
> Step 2) tidy -i yourfile.xml
> Step 3) tidy --help
>
> REXML is very bad for handling such things.

For the record, would you care to clarify and justify that statement?

Michael Fellinger wrote:

Sure, i've tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn't
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.

> []

REXML::VERSION
# "3.1.6"

Don'tunderstand what you mean by "new REXML", since you seem to be using an old one. I'm on 3.1.7.1, and here you do, for example:

formatter = REXML::Formatters::Pretty.new( 3 )
formatter.compact = true
formatter.write( doc, $stdout)

Best regards,

Jari Williamsson

Michael Fellinger wrote:

> Sure, i've tried for quite some time to get REXML to a point where it
> really pretty-prints any document, but apart from implementing a whole
> streamlistener that keeps track of indentation and width there doesn't
> seem to be any. The new REXML works a bit better but inserts lots of
> whitespace at the wrong places.
> []
>
> REXML::VERSION
> # "3.1.6"

Don'tunderstand what you mean by "new REXML", since you seem to be using
an old one. I'm on 3.1.7.1, and here you do, for example:

By new i mean 3.1.7 - which has formatters. But the one that ships
with ruby is still 3.1.6 - if i require a dependency then i can just
use tidy instead, no?

···

On Dec 11, 2007 10:50 PM, Jari Williamsson <jari.williamsson@mailbox.swipnet.se> wrote:

formatter = REXML::Formatters::Pretty.new( 3 )
formatter.compact = true
formatter.write( doc, $stdout)