[ANNOUNCE] ruby-libxml (2002-07-09)

http://www.rubynet.org/modules/xml/ruby-libxml/ruby-libxml-20020709.tar.gz

It’s functional, it’s young, it validates, it handles compressed
documents, it’s C, it’s quick (at least as far as XML parsers go), and
feature rich (or will be: I haven’t turned on XInclude, XPointer,
Schemas, DocBook, or a hand full of others that are available). The
biggest two things that libxml is lacking right now are documentation
and a more complete API: features are coming and there likely won’t be
any shortage of them. Here’s a sample script that shows some of the
API:

xd = XML::Document.file(‘some_xml_file.xml’)
xd.xpath_find(’/my/xpath/query’).each do |node|
puts "Filename: #{node.child(‘filename’)}"
puts "Mode: #{node.child(‘mode’)}"
puts "Content: #{node.child(‘content’)}"
end

And the corresponding test doc:

uga ble sweet uga ble sweet

Simple for the most part.

If someone gets curious and runs the "document_self.rb"
script, they’ll notice a whole ton of methods. This API doesn’t hide
any of the internals of XML to the user, which is good and bad. For
example, in the xml bit “bar”, there are technically two
nodes there. The tag is one, and it’s contents is another. The
other interesting thing about this is that in the following XML block,
there are five nodes.

bar
asd

  1. bar
  2. [white space between and ]
  3. asd

While this is technically the correct behavior, it’s unintuitive to
most people and isn’t what they want. As a result, node.to_s is the
exact same as node.child.content. In the API, I’m making things nice
and friendly though so no one will notice (or so the theory goes).
Anyway, point being that there’s a lot of power at the moment for
those who want to have full access to the XML document.

Anyway, that’s my bit for now. I’m looking for new users/suggestions:
both are very welcome. -sc

···


Sean Chittenden

Sean Chittenden wrote:

[libxml]

great

DocBook

what’s to be implemented for DocBook?

xd = XML::Document.file(‘some_xml_file.xml’)
xd.xpath_find(‘/my/xpath/query’).each do |node|
puts “Filename: #{node.child(‘filename’)}”
puts “Mode: #{node.child(‘mode’)}”
puts “Content: #{node.child(‘content’)}”
end

what do the puts put?

Tobi

···


http://www.pinkjuice.com/

Sean Chittenden wrote:

[libxml]

great

>DocBook

what's to be implemented for DocBook?

Honestly, I'm not 100% sure yet. libxml's huge and there's lots
there. I'm still uncovering features and nifty ways of doing things.
My best guess is that it's basically a Schema/DTD for the DocBook
markup (ie, won't let you write a bad/invalid docbook document).

>xd = XML::Document.file('some_xml_file.xml')
>xd.xpath_find('/my/xpath/query').each do |node|
> puts "Filename: #{node.child('filename')}"
> puts "Mode: #{node.child('mode')}"
> puts "Content: #{node.child('content')}"
>end

what do the puts put?

In this case, the contents of the node's child. With the XML below
(and the amended example: sorry, was doing a little too much
copy/pasting from the rubynet test examples), it would produce:

Filename: uga
Mode: ble
Content: sweet

# Amended example
xd = XML::Document.file('some_xml_file.xml')
xd.xpath_find('/my/xpath/query').each do |node|
  puts "Filename: #{node.child('foo')}"
  puts "Mode: #{node.child('bar')}"
  puts "Content: #{node.child('baz')}"
end

# XML snippet
<my>
  <xpath>
    <query>
      <foo>uga</foo>
      <bar>ble</mode>
      <baz>sweet</baz>
    </query>
  </xpath>
</my>

node.to_s == node.child.content

-sc

···

--
Sean Chittenden

Tobias Reif tobiasreif@pinkjuice.com writes:

Sean Chittenden wrote:

[libxml]

great

DocBook

what’s to be implemented for DocBook?

IIRC, libxml includes a limited “SGML” parser which is designed
specifically to parse DocBook/SGML documents, and no other kind of
SGML.

···


Pierre-Charles David (pcdavid tiscali fr)
Computer Science PhD Student, École des Mines de Nantes, France
Homepage: pcdavid.net | Pierre-Charles David

Sean Chittenden wrote:

what’s to be implemented for DocBook?

Honestly, I’m not 100% sure yet. libxml’s huge and there’s lots
there. I’m still uncovering features and nifty ways of doing things.
My best guess is that it’s basically a Schema/DTD for the DocBook
markup (ie, won’t let you write a bad/invalid docbook document).

Well, there are various DTDs for DocBook; authors want to choose one,
and switch it independently from the kit. I don’t think it should be an
integrated part of an XML parser or toolkit, if this doesn’t offer any
advantages over simply using a DTD which didn’t come with the XML kit.

Tobi

···


http://www.pinkjuice.com/

Pierre-Charles David wrote:

IIRC, libxml includes a limited “SGML” parser which is designed
specifically to parse DocBook/SGML documents, and no other kind of
SGML.

Ah OK. I use the XML DTDs, so I can use any XML tool; actually, any SGML
tool should be able to deal with it as well.

Tobi

···


http://www.pinkjuice.com/