Parsing, manipulating XML is such wide subject. There is a more then
one bookshelf full with books about it. Doing it with Ruby is not an
exception.
Beside these two libraries mentioned there is also an Hpricot (http://
code.whytheluckystiff.net/hpricot/) and you should try it to.
When dealing with XML you should consider the following questions:
Who and on what OS the code will be running?
How big the XML document is?
Is the speed a decisive parameter?
What’s the magnitude of manipulation required?
Answers to these questions could help you pick the optimum library but
you should be familiar with all of them.
Do a research, play a little and pick the more appealing to you.
···
On Jul 19, 10:42 pm, Cédric H. <cedric.hernalste...@gmail.com> wrote:
Hi guys,
I'm looking for some information about the xml libraries available in
Ruby.
I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :
- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?
- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?
- is libxml really a full validating and compliant parser ?
- how do you use xslt in Ruby ? do you usehttp://raa.ruby-lang.org/project/ruby-xslt/
orhttp://rubyforge.org/projects/libxsl/(if I'm right the second one
is part of libxml ? )
As you see I'm lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .
I'm looking for some information about the xml libraries available in
Ruby.
I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :
- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?
Yes. It's also widely acknowledged as very slow. The RE stands for Regular Expressions, which are only fast when used carefully. Basing an entire parser on them tends to abuse them.
This blog show how to spot-check compliance issues in the three leading Ruby XML parsers:
- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?
Ordinarily, that process would be mostly harmless. You may already have libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.
However, the current libxml-ruby has a nasty bug. First, it sprays lots of
No definition for ruby_xml_parser_context_options_get
into the console. Then it refuses to install the libxml_so.so file that it just created. I don't know this bug's status, but because my assert_xpath works best with libxml, I must overcome it whenever we build a new workstation at work! Sometimes I must manually copy its executables into Ruby's paths...
(Our production code does not use libxml - only the test code.)
I just tried to install while writing this post, and 0.8.1 might have worked on Ubuntu.
- is libxml really a full validating and compliant parser ?
I suspect it's the reference implementation for XML. It certainly takes every DOCTYPE and schema very seriously!
Better, it actually forgives some errors and keeps working, unlike REXML
As you see I'm lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .
Sorry! I was knocking 'em down, and you lost me at XSLT.
In a pinch, I would pipe text thru xsltproc, and not worry about deep language integration. XSLT is nothing but a big filter, so I thought you could use it without making an object out of it.
Beside these two libraries mentioned there is also an Hpricot (http://
code.whytheluckystiff.net/hpricot/) and you should try it to.
Hpricot is a jack-of-all-trades-master-of-some-of-them. Don't look to it for schema validation, XSLT, or true XPath.
When dealing with XML you should consider the following questions:
Who and on what OS the code will be running?
How big the XML document is?
Is the speed a decisive parameter?
What’s the magnitude of manipulation required?
The two XML parser models are DOM and SAX.
DOM converts every tag into an Object (hence Document Object Model), and lets you traverse the objects. The conversion is slow, and puts the entire document into memory, simultaneously.
SAX lets you register callbacks to call when an XML reader encounters certain tags. It treats the input XML as a stream, hence zipping past nodes you don't need is very fast.
I'm looking for some information about the xml libraries available in
Ruby.
I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :
- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?
Yes. It's also widely acknowledged as very slow. The RE stands for Regular Expressions, which are only fast when used carefully. Basing an entire parser on them tends to abuse them.
This blog show how to spot-check compliance issues in the three leading Ruby XML parsers:
- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?
Ordinarily, that process would be mostly harmless. You may already have libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.
However, the current libxml-ruby has a nasty bug. First, it sprays lots of
No definition for ruby_xml_parser_context_options_get
into the console. Then it refuses to install the libxml_so.so file that it just created. I don't know this bug's status, but because my assert_xpath works best with libxml, I must overcome it whenever we build a new workstation at work! Sometimes I must manually copy its executables into Ruby's paths...
(Our production code does not use libxml - only the test code.)
I just tried to install while writing this post, and 0.8.1 might have worked on Ubuntu.
- is libxml really a full validating and compliant parser ?
I suspect it's the reference implementation for XML. It certainly takes every DOCTYPE and schema very seriously!
Better, it actually forgives some errors and keeps working, unlike REXML
Sorry! I was knocking 'em down, and you lost me at XSLT.
In a pinch, I would pipe text thru xsltproc, and not worry about deep language integration. XSLT is nothing but a big filter, so I thought you could use it without making an object out of it.
FYI, Still some final fine-tuning going on, so don't expect everything
to be all roses just quite yet. But we are close, and might actually
get to to a 1.0.0 release soon.
T.
···
On Jul 19, 9:03 pm, Phillip Oertel <m...@phillipoertel.com> wrote:
REXML supports a "SAX Like" stream listening interface as well as DOM. See the REXML tutorial at http://www.germane-software.com/software/rexml/docs/tutorial.html, scroll down until you see the section headed with "Stream Parsing". The upshot is you write a class that has callback methods (see http://www.germane-software.com/software/rexml/doc/classes/REXML/StreamListener.html for a complete list of callbacks) and pass an instance of the class to REXML's parse_stream method. REXML also supports a SAX2 API, but I have never used it. Look for the heading "SAX2 Stream Parsing" in the tutorial link above.
Recently converted a poor DOM based parsing solution to a stream listener based solution (not SAX2) and realized an order of magnitude improvement in performance.
FYI, Still some final fine-tuning going on, so don't expect everything
to be all roses just quite yet. But we are close, and might actually
get to to a 1.0.0 release soon.
And to use it with assert_xpath you just gotta put invoke_libxml in your setup...