I'm new to Ruby and can't figure out why REXML isn't returning the elements
in the order they appear in the document. Here's my code and the document.
Any help appreciated.
Thanks,
Ted
···
#==============================
# ruby
#==============================
xml = REXML::Document.new(File.open("test.html"));
xml.elements.each("//span[@class='c5']") do |element|
puts element
end
I'm new to Ruby and can't figure out why REXML isn't returning the elements
in the order they appear in the document. Here's my code and the document.
I confirm the problem. Looks like a bug. If I remove some of the anchors, it works.
(Off-topic - no need to use empty named anchors in your page - just use IDs on existing elements instead.)
Thanks Gavin. Unfortunately I can't remove the anchors. The html is just a
sample of the documents (not my docs) that I'm given to parse. Someone on
IRC mentioned that XPath 1.0 doesn't guarantee the order of elements.
"Gavin Kistner" <gavin@refinery.com> wrote in message
news:6A73B666-6668-430A-8C58-96DC8294A970@refinery.com...
···
On Sep 3, 2005, at 3:16 PM, ted wrote:
I'm new to Ruby and can't figure out why REXML isn't returning the
elements
in the order they appear in the document. Here's my code and the
document.
I confirm the problem. Looks like a bug. If I remove some of the
anchors, it works.
(Off-topic - no need to use empty named anchors in your page - just
use IDs on existing elements instead.)
> I'm new to Ruby and can't figure out why REXML isn't returning the
> elements
> in the order they appear in the document. Here's my code and the
> document.
to e.g. "C:\Ruby\TEMP" then change the lookup path at the top of your script.
$:.unshift('C:/Ruby/TEMP') # for rexml fixes
require 'rexml/document'
xml = REXML::Document.new(DATA)
xml.elements.each("//span[@class='c5']") do |element|
puts element
end
Thanks Gavin. Unfortunately I can't remove the anchors. The html is just a
sample of the documents (not my docs) that I'm given to parse. Someone on
IRC mentioned that XPath 1.0 doesn't guarantee the order of elements.
I would be astonished if Sean Russell had combed through the 1.0 spec
to find some loophole that made it plausible to have an iteration not
follow document order. I could be wrong but I think it's more likely
a REXML bug.
"daz" <dooby@d10.karoo.co.uk> wrote in message
news:YTWdnRhBUv9S0ofeSa8jmw@karoo.co.uk...
···
Gavin Kistner wrote:
On Sep 3, 2005, at 3:16 PM, ted wrote:
> I'm new to Ruby and can't figure out why REXML isn't returning the
> elements
> in the order they appear in the document. Here's my code and the
> document.
to e.g. "C:\Ruby\TEMP" then change the lookup path at the top of your
script.
$:.unshift('C:/Ruby/TEMP') # for rexml fixes
require 'rexml/document'
xml = REXML::Document.new(DATA)
xml.elements.each("//span[@class='c5']") do |element|
puts element
end
I just wanted to mention that I encountered the same bug and that the
new version of the library fixed it for me. Thank you very much for
the clear instructions. If only for pay products had support that was
this good....
Daz, there's a bug in the CVS version of REXML. The following code
produces the error below, but works perfectly with the default 1.8.2
REXML (i.e., when I comment out the first line).
ruby rexmlbug.rb
C:/Dan/dev/rexml/xpath_parser.rb:157:in `expr': undefined method
`delete_if' for nil:NilClass (NoMethodError)
from C:/Dan/dev/rexml/xpath_parser.rb:481:in `d_o_s'
from C:/Dan/dev/rexml/xpath_parser.rb:478:in `each_index'
from C:/Dan/dev/rexml/xpath_parser.rb:478:in `d_o_s'
from C:/Dan/dev/rexml/xpath_parser.rb:469:in `descendant_or_self'
from C:/Dan/dev/rexml/xpath_parser.rb:314:in `expr'
from C:/Dan/dev/rexml/xpath_parser.rb:125:in `match'
from C:/Dan/dev/rexml/xpath_parser.rb:56:in `parse'
from C:/Dan/dev/rexml/xpath.rb:53:in `each'
from rexmlbug.rb:28
Exit code: 1
$:.unshift('C:/Dan/dev') # for rexml fixes
require "rexml/document"
include REXML
string = <<EOF
<html>
<td class="t4"><a href="javascript:lu('OZ')">OZ</a>
0204 F Class
<a href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/ICN,itn/air/mp">
ICN</a> to <a
href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/LAX,itn/air/mp">
LAX</a></td>
<tr>
<td class="t4"><font color="white">UNITED</font></td>
<td colspan="4" align="right">
<strong>48,164</strong></td>
</tr>
<tr>
<td class="t4"><font color="white">Star
Alliance</font></td>
<td colspan="4" align="right">
<strong>49,072</strong></td>
</tr>
</html>
EOF