Hi Everyone,
There appears to be a bug in REXML 2.7.1 external entity parsing. The
following code throws an error in Ruby 1.8.0/REXML 2.7.1, but not in
Ruby 1.6.8/REXML 2.3.5:
···
#!/usr/bin/env ruby
require ‘rexml/document’
XP = ‘//channel/title’
dump versions
puts ‘Ruby %s, REXML %s’ % [RUBY_VERSION, REXML::Version]
check both examples
%w{working.rss broken.rss}.each do |path|
File.open(path) do |file|
doc = REXML::Document.new file.readlines.join(’’)
puts 'File: ' << path
# check to make sure everything is kosher
puts 'doc.root.class = ' << doc.root.class.to_s
puts 'doc.root.elements.class = ' << doc.root.elements.class.to_s
# get the title of the feed
puts (e = doc.root.elements[XP]) ? e.class.to_s : "Couldn't find #{XP}."
end
end
2.3.5 Output
Ruby 1.6.8, REXML 2.3.5
File: working.rss
doc.root.class = REXML::Element
doc.root.elements.class = REXML::Elements
2.7.1 Output
Ruby 1.8.0, REXML 2.7.1
File: working.rss
doc.root.class = REXML::Element
doc.root.elements.class = REXML::Elements
REXML::Element
File: broken.rss
doc.root.class = REXML::Element
doc.root.elements.class = REXML::Elements
/usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:83:in internal_parse': undefined method
node_type’ for #REXML::Entity:0x4027d9d0 (NoMethodError)
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:81:in delete_if' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:81:in
internal_parse’
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:60:in match' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:315:in
d_o_s’
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:313:in each_index' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:313:in
d_o_s’
from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:317:in d_o_s' from /usr/local/lib/site_ruby/1.8/rexml/xpath_parser.rb:313:in
each_index’
… 8 levels…
from ./rexml_test.rb:12:in open' from ./rexml_test.rb:12 from ./rexml_test.rb:11:in
each’
from ./rexml_test.rb:11
The files in question and additional information are available at
http://www.raggle.org/files/rexml-external_entity_bug/ . We’re
stripping external entity declarations before parsing feeds in Raggle as
an interim solution.
PS. I attempted to use the REXML bug report page on the Germane
Software site, but it gave me the following error:
The system encountered a fatal error
failed to chroot(/home/jitterbug/rexml)
The last error code was: Operation not permitted
uid/gid=81/81
–
Paul Duncan pabs@pablotron.org OpenPGP Key ID: 0x82C29562
http://www.pablotron.org/ http://www.paulduncan.org/