Libhxml Node#remove! kills each loop and extension to #<<

hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

  <root>
    <a id="a"></a>
    <b id="b"></b>
    <c id="c"></c>
  </root>

  root.each { |node|
    if XML::Node === node
      node.content = "yep"
      node.remove! if node['id'] = "b"
    end
  }

The result is

  <root>
    <a id="a">yep</a>
    <c id="c"></c>
  </root>

Would tha tbe a bug? Or something that simply can't be avoided?

Also, I found this extension to #<< to be useful:

  class XML::Node
    alias_method :append, :<<
    def <<( node )
      if Array === node
        node.each { |n| self.append n }
      else
        super
      end
    end
  end

Thanks,
T.

transfire@gmail.com wrote:

Also, I found this extension to #<< to be useful:

  class XML::Node
    alias_method :append, :<<
    def <<( node )
      if Array === node
        node.each { |n| self.append n }
      else
        super
      end
    end
  end

s/super/append(node)/

T.

I'm not aware (and couldn't find) any libhxml - if you meant ruby-libxml (which seems likely given the problem), here's what I figured out.

At first, I thought it could be a bug caused by modification to the structure you're iterating over, similar to this:

root = ['a','b','c']
root.each { |node| root.delete(node) if node == "b" }

which will skip over 'c' due to the deletion.

But while I was trying to confirm this in libxml, I found behaviour that makes me think there's some more fundamental bug. Redefining a variable seemed to have some very odd effects, which I managed to reduce to this case:

irb(main):001:0> require 'rubygems' # => true
irb(main):002:0> require 'xml/libxml' # => true
irb(main):003:0> root = XML::Node.new("root") # => <root/>
irb(main):004:0> a = XML::Node.new("a") # => <a/>
irb(main):005:0> b = XML::Node.new("b") # => <b/>
irb(main):006:0> root # => <root/>
irb(main):007:0> root << a # => <a/>
irb(main):008:0> root
# everything
=> <root>
   <a/>
</root>
irb(main):009:0> root << b # => <b/>
irb(main):010:0> root
=> <root>
   <a/>
   <b/>
</root>
irb(main):011:0> root = XML::Node.new("root") # => <root/>
irb(main):012:0> root # => <root/>
irb(main):013:0> root << a # => <a/>
irb(main):014:0> root
=> <root>
   <a/>
   <b/> # where did *this* come from?
</root>

(That's the existing definition of #<<, not your extension)

Exiting from the irb session results in a segmentation fault, and running the same code outside of irb yields the same apparent results (inclusion of 'b' where it shouldn't be), and resulted in a bus error. I have the hunch that the C extension isn't managing memory properly, which is confirmed by one of the errors submitted on the project page. Maybe this is just my setup (1.8.4 on OSX), but it seems to me that the library has enough problems that it's not quite ready for use.

matthew smillie.

···

On Jul 8, 2006, at 2:37, transfire@gmail.com wrote:

hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

  <root>
    <a id="a"></a>
    <b id="b"></b>
    <c id="c"></c>
  </root>

  root.each { |node|
    if XML::Node === node
      node.content = "yep"
      node.remove! if node['id'] = "b"
    end
  }

The result is

  <root>
    <a id="a">yep</a>
    <c id="c"></c>
  </root>

Would tha tbe a bug? Or something that simply can't be avoided?

It's usually a problem to change a container while iterating through
it. This can generate all sorts of weird effects. It's generally
better to rely on this *not* being possible unless explicitely stated
(e.g most of Java's iterators implement remove() which savely removes
an element while iterating).

In your case I'd either first remove the one you want to get rid of,
iterate using an index (if that's possible) or remember objects to
remove in some kind of container and do the removal after the
iteration (probably the most efficient solution).

Kind regards

robert

···

2006/7/8, transfire@gmail.com <transfire@gmail.com>:

hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

  <root>
    <a id="a"></a>
    <b id="b"></b>
    <c id="c"></c>
  </root>

  root.each { |node|
    if XML::Node === node
      node.content = "yep"
      node.remove! if node['id'] = "b"
    end
  }

The result is

  <root>
    <a id="a">yep</a>
    <c id="c"></c>
  </root>

Would tha tbe a bug? Or something that simply can't be avoided?

--
Have a look: Robert K. | Flickr

PS: Here's another alternative that might work: use delete_if to
iterate and delete those elements you want to get rid of.

root.delete_if do |node|
   if XML::Node === node
     node.content = "yep"
     node['id'] == "b"
   else
     false
   end
end

Cheers

robert

Matthew Smillie wrote:

I'm not aware (and couldn't find) any libhxml - if you meant ruby-
libxml (which seems likely given the problem), here's what I figured
out.

:slight_smile: Yes libxml bindings is indeed what I was refering (h was a typo)

At first, I thought it could be a bug caused by modification to the
structure you're iterating over, similar to this:

root = ['a','b','c']
root.each { |node| root.delete(node) if node == "b" }

which will skip over 'c' due to the deletion.

But while I was trying to confirm this in libxml, I found behaviour
that makes me think there's some more fundamental bug. Redefining a
variable seemed to have some very odd effects, which I managed to
reduce to this case:

[snip]

=> <root>
   <a/>
   <b/> # where did *this* come from?
</root>

(That's the existing definition of #<<, not your extension)

Exiting from the irb session results in a segmentation fault, and
running the same code outside of irb yields the same apparent results
(inclusion of 'b' where it shouldn't be), and resulted in a bus
error. I have the hunch that the C extension isn't managing memory
properly, which is confirmed by one of the errors submitted on the
project page. Maybe this is just my setup (1.8.4 on OSX), but it
seems to me that the library has enough problems that it's not quite
ready for use.

Thanks matthew. Very enlightening. I decdided to write a xml wrapper
and create an common interface for either REXML and libxml. That way I
can use REXML for now and easy switch over when libxml binding are
fully operational.

T.