[BUG] REXML 2.7.1 External Entity Parsing

Hi Everyone,

There appears to be a bug in REXML 2.7.1 external entity parsing. The
following code throws an error in Ruby 1.8.0/REXML 2.7.1, but not in
Ruby 1.6.8/REXML 2.3.5:

···

#!/usr/bin/env ruby

require ‘rexml/document’

XP = ‘//channel/title’

dump versions

puts ‘Ruby %s, REXML %s’ % [RUBY_VERSION, REXML::Version]

check both examples

%w{working.rss broken.rss}.each do |path|
File.open(path) do |file|
doc = REXML::Document.new file.readlines.join(’’)

puts 'File: ' << path

# check to make sure everything is kosher
puts 'doc.root.class = ' << doc.root.class.to_s
puts 'doc.root.elements.class = ' << doc.root.elements.class.to_s

# get the title of the feed
puts (e = doc.root.elements[XP]) ? e.class.to_s : "Couldn't find #{XP}."

end
end

2.3.5 Output

Ruby 1.6.8, REXML 2.3.5
File: working.rss
doc.root.class = REXML::Element
doc.root.elements.class = REXML::Elements

Paul Duncan File: broken.rss doc.root.class = REXML::Element doc.root.elements.class = REXML::Elements O'Reilly Network Articles

2.7.1 Output

Ruby 1.8.0, REXML 2.7.1
File: working.rss
doc.root.class = REXML::Element
doc.root.elements.class = REXML::Elements
REXML::Element
File: broken.rss
doc.root.class = REXML::Element
doc.root.elements.class = REXML::Elements
/usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:83:in internal_parse': undefined methodnode_type’ for #REXML::Entity:0x4027d9d0 (NoMethodError)
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:81:in delete_if' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:81:ininternal_parse’
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:60:in match' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:315:ind_o_s’
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:313:in each_index' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:313:ind_o_s’
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:317:in d_o_s' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:313:ineach_index’
… 8 levels…
from ./rexml_test.rb:12:in open' from ./rexml_test.rb:12 from ./rexml_test.rb:11:ineach’
from ./rexml_test.rb:11

The files in question and additional information are available at
http://www.raggle.org/files/rexml-external_entity_bug/ . We’re
stripping external entity declarations before parsing feeds in Raggle as
an interim solution.

PS. I attempted to use the REXML bug report page on the Germane
Software site, but it gave me the following error:

The system encountered a fatal error
failed to chroot(/home/jitterbug/rexml)
The last error code was: Operation not permitted
uid/gid=81/81 


Paul Duncan pabs@pablotron.org OpenPGP Key ID: 0x82C29562
http://www.pablotron.org/ http://www.paulduncan.org/

Thanks. I’m on it.

BTW, I’ll be in the UK from the 12th-27th, and won’t have internet access.
Any bugs reported during that time will be dealt with upon my return.

— SER

···

On Friday 05 September 2003 07:40, Paul Duncan wrote:

There appears to be a bug in REXML 2.7.1 external entity parsing. The
following code throws an error in Ruby 1.8.0/REXML 2.7.1, but not in
Ruby 1.6.8/REXML 2.3.5:

> There appears to be a bug in REXML 2.7.1 external entity parsing. The
> following code throws an error in Ruby 1.8.0/REXML 2.7.1, but not in
> Ruby 1.6.8/REXML 2.3.5:

Thanks. I'm on it.

Cool. I appreciate the quick response :).

Like I said, I'm just stripping external entities in Raggle as an
interim solution.

If anyone else is parsing RSS in REXML 2.7.1, they should consider
doing the same. Here's the code I'm using:

  if $config['strip_external_entities'] && content =~ /<!ENTITY %.*?>/m
    content.gsub!(/<!ENTITY %.*?>/m, '')
  end

(it's not perfect, but it's good enough for now).

···

* Sean Russell (ser@germane-software.com) wrote:

On Friday 05 September 2003 07:40, Paul Duncan wrote:

BTW, I'll be in the UK from the 12th-27th, and won't have internet
access. Any bugs reported during that time will be dealt with upon my
return.

--- SER

--
Paul Duncan <pabs@pablotron.org> pabs in #gah (OPN IRC)
http://www.pablotron.org/ OpenPGP Key ID: 0x82C29562