REXML escaping seems broken

Part of the point of using a library to build up a DOM and output it,
is that all the messy escaping is supposed to be handled under the
covers.

But for this test case:

    require 'test/unit'
    require 'rexml/text'
    include REXML

    class TestRexmlEscapes < Test::Unit::TestCase
      def test_escape_ampersand
        assert_equal "&amp;amp;", Text.new("&amp;").to_s
      end
    end

Results:
  1) Failure:
test_escape_ampersand(TestRexmlEscapes) [test_rexml_escapes.rb:7]:
<"&amp;amp;"> expected but was
<"&amp;">.

In other words, it seems to arbitrarily ignore the "&" which really
needs to be escaped. I didn't ask for :raw at all, and IMO it should
escape everything unless you tell it not to.

It seems like the only workaround for this is to manually escape all
strings correctly and then pass as raw into REXML. But if I'm going to
go to that much trouble, there's really no point in using the library
in the first place as it becomes just as easy to concatenate some
strings together.

Do people consider this to be a real bug?

TX

Perhaps the docs for REXML::Text::new will illuminate the situation:

http://ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Text.html#M002886

You may also be interested in Sam Ruby's xchar.rb experiment:

http://www.intertwingly.net/blog/2005/09/28/XML-Cleansing
Cheers,
/Nick

···

On 10/20/05, Trejkaz <trejkaz@gmail.com> wrote:

It seems like the only workaround for this is to manually escape all
strings correctly and then pass as raw into REXML. But if I'm going to
go to that much trouble, there's really no point in using the library
in the first place as it becomes just as easy to concatenate some
strings together.

Do people consider this to be a real bug?

Yeah, I know that the API docs say that it doesn't escape known
entities, but I was fishing to see whether anybody recognised that this
is a real problem.

Basically, because of this, *every single time* I have to set in text,
I have to escape it myself and then pass it in as raw.

TX