Rexml: generating tree from source

Hello out there,

I try to parse some xml-text with rexml:

···

--------------------------------------------------
#!/usr/bin/env ruby

require 'rexml/document'
include REXML

str=<<EOS
<a><b>text with <illegal characters> </b></a>
EOS

# d=Document.new(str) # barks because < and >
s=Source.new(str)
# I thought that this would give me some output
puts Element.new(s) #-> </>
--------------------------------------------------

What I'd like to have is a representation of

element: a
   element: b
   text: "text with &lt;illegal characters&gt; "

Any possibility?

Thanks,

Patrick

Patrick Gundlach wrote:

Hello out there,

I try to parse some xml-text with rexml:

--------------------------------------------------
#!/usr/bin/env ruby

require 'rexml/document'
include REXML

str=<<EOS
<a><b>text with <illegal characters> </b></a>
EOS

# d=Document.new(str) # barks because < and >
s=Source.new(str)
# I thought that this would give me some output
puts Element.new(s) #-> </>
--------------------------------------------------

What I'd like to have is a representation of

element: a
   element: b
   text: "text with &lt;illegal characters&gt; "

Any possibility?

This may be applicable:

<Quoting>
http://www.germane-software.com/software/rexml/docs/tutorial.html

[Creating XML documents]
[...]

"Please be aware that all text nodes in REXML are UTF-8 encoded,
and all of your code must reflect this. You may input and output
other encodings (UTF-8, UTF-16, ISO-8859-1, and UNILE are all
supported, input and output), but within your program, you must
pass REXML UTF-8 strings."

"I can't emphasize this enough, because people do have problems
with this. REXML can't possibly alway guess correctly how your
text is encoded, so it always assumes the text is UTF-8."
</>

daz

[...]

What I'd like to have is a representation of

element: a
   element: b
   text: "text with &lt;illegal characters&gt; "

I withdraw my request, because handling invalid XML files is not
what one should ask for....

Sorry for the noise,

Patrick

Hello daz,

This may be applicable:

[rexml, utf-8]

i don't think so, since my input _is_ utf-8 compliant (and ascii,
iso-latin-1 etc.).

The only problematic chars in my code are '<' and '>'.

I use the code exactly as shown.

Patrick

···

--------------------------------------------------
#!/usr/bin/env ruby

require 'rexml/document'
include REXML

str=<<EOS
<a><b>text with <illegal characters> </b></a>
EOS

# d=Document.new(str) # barks because < and >
s=Source.new(str)
# I thought that this would give me some output
puts Element.new(s) #-> </>
--------------------------------------------------