XML in ruby

What do you usually use to work with XML in ruby. REXML seems to be
too slow and libxml too buggy. Any other option I should try?

Thanks,

Pedro.

Pedro Côrte-Real wrote:

What do you usually use to work with XML in ruby. REXML seems to be
too slow and libxml too buggy. Any other option I should try?

Unless you've got a specific reason not to (as in, what is it you're actually trying to acheive?), check out REXML's StreamParser. More than fast enough for my needs, but I'm probably not a representative sample.

Then again, who is?

···

--
Alex

I am using the stream parser. And although that seems a little slow
it's ok. What I'm stuck with being too slow is the XPath support. If I
can't get libxml to work I guess I'll have to end up creating a small
ruby extension just to wrap the C libxml xpath support. My needs are
very simple so this should be easy, but I was hoping there was another
way.

Pedro.

···

On 6/29/06, Alex Young <alex@blackkettle.org> wrote:

Pedro Côrte-Real wrote:
> What do you usually use to work with XML in ruby. REXML seems to be
> too slow and libxml too buggy. Any other option I should try?
Unless you've got a specific reason not to (as in, what is it you're
actually trying to acheive?), check out REXML's StreamParser. More than
fast enough for my needs, but I'm probably not a representative sample.

Pedro Côrte-Real wrote:

···

On 6/29/06, Alex Young <alex@blackkettle.org> wrote:

Pedro Côrte-Real wrote:
> What do you usually use to work with XML in ruby. REXML seems to be
> too slow and libxml too buggy. Any other option I should try?
Unless you've got a specific reason not to (as in, what is it you're
actually trying to acheive?), check out REXML's StreamParser. More than
fast enough for my needs, but I'm probably not a representative sample.

I am using the stream parser. And although that seems a little slow
it's ok. What I'm stuck with being too slow is the XPath support. If I
can't get libxml to work I guess I'll have to end up creating a small
ruby extension just to wrap the C libxml xpath support. My needs are
very simple so this should be easy, but I was hoping there was another
way.

Oh, I see... Sorry, I can't help you there...

--
Alex

If your needs are so simple then you should be able to handle this
with the stream parser - you'll likely have to only remember all nodes
on the stack (probably along with their attributes, depending on what
criteria you have to apply) and then decide what to do with the
current node.

HTH

robert

···

2006/6/29, Pedro Côrte-Real <pedrocr@gmail.com>:

I am using the stream parser. And although that seems a little slow
it's ok. What I'm stuck with being too slow is the XPath support. [...] My needs are
very simple so this should be easy, but I was hoping there was another
way.

--
Have a look: http://www.flickr.com/photos/fussel-foto/

My co-workers and I recently converted a bunch of rexml code to libxml. The
speed increase was dramatic ( 100-1000 times faster ). We have not run into
any stability issues. We use libxml to read, search, delete/change nodes and
values, and write out new files, all with no issues. What kind of issues are
you hitting while using libxml?

Mark

···

On 6/29/06, Alex Young <alex@blackkettle.org> wrote:

Pedro Côrte-Real wrote:
> On 6/29/06, Alex Young <alex@blackkettle.org> wrote:
>
>> Pedro Côrte-Real wrote:
>> > What do you usually use to work with XML in ruby. REXML seems to be
>> > too slow and libxml too buggy. Any other option I should try?
>> Unless you've got a specific reason not to (as in, what is it you're
>> actually trying to acheive?), check out REXML's StreamParser. More
than
>> fast enough for my needs, but I'm probably not a representative sample.
>
> I am using the stream parser. And although that seems a little slow
> it's ok. What I'm stuck with being too slow is the XPath support. If I
> can't get libxml to work I guess I'll have to end up creating a small
> ruby extension just to wrap the C libxml xpath support. My needs are
> very simple so this should be easy, but I was hoping there was another
> way.
Oh, I see... Sorry, I can't help you there...

--
Alex

--
Mark Van Holstyn
mvette13@gmail.com
http://lotswholetime.com

I am using the stream parser. And although that seems a little slow
it's ok. What I'm stuck with being too slow is the XPath support. If I
can't get libxml to work I guess I'll have to end up creating a small
ruby extension just to wrap the C libxml xpath support. My needs are
very simple so this should be easy, but I was hoping there was another
way.

You might take a look at http://teius.rubyforge.org. I've been very happy with it.

From the teius wiki homepage:

    Teius is really a tiny Ruby wrapper around the LibXML C library.

Not really, because I want to support arbitrary XPath's and I'm not
going to implement a XPath engine by myself. The XPath's aren't an
internal thing, they're defined in a config file to get stuff from the
XML, so I want full XPath support.

Pedro.

···

On 6/29/06, Robert Klemme <shortcutter@googlemail.com> wrote:

2006/6/29, Pedro Côrte-Real <pedrocr@gmail.com>:
> I am using the stream parser. And although that seems a little slow
> it's ok. What I'm stuck with being too slow is the XPath support. [...] My needs are
> very simple so this should be easy, but I was hoping there was another
> way.

If your needs are so simple then you should be able to handle this
with the stream parser - you'll likely have to only remember all nodes
on the stack (probably along with their attributes, depending on what
criteria you have to apply) and then decide what to do with the
current node.

I did something as simple as:

parser = XML::Parser.new
parser.string = mydocstring
doc = parser.parse

That last line blew up with a segfault. I can do the same in irb and
it works although it happened once when exiting irb. Seems to be a
race condition of some sort.

100-1000 times faster seems great. If it worked well I'd convert
xmlcodec over to it.

Pedro.

···

On 6/29/06, Mark Van Holstyn <mvette13@gmail.com> wrote:

My co-workers and I recently converted a bunch of rexml code to libxml. The
speed increase was dramatic ( 100-1000 times faster ). We have not run into
any stability issues. We use libxml to read, search, delete/change nodes and
values, and write out new files, all with no issues. What kind of issues are
you hitting while using libxml?

Seems great, but I couldn't install it. The gem threw a bunch of
errors when trying to compile. And it seems it only supports reading
from a file and not an IO or string. I'll have to look at the code.

Pedro.

···

On 6/29/06, Gordon Thiesfeld <gthiesfeld@sbcglobal.net> wrote:

> I am using the stream parser. And although that seems a little slow
> it's ok. What I'm stuck with being too slow is the XPath support. If I
> can't get libxml to work I guess I'll have to end up creating a small
> ruby extension just to wrap the C libxml xpath support. My needs are
> very simple so this should be easy, but I was hoping there was another
> way.

You might take a look at http://teius.rubyforge.org. I've been very happy with it.

From the teius wiki homepage:

    Teius is really a tiny Ruby wrapper around the LibXML C library.

It does support reading from a string with #parse_string. I got it to
install by changing extconf.rb. It was looking in /usr/include/libxml
instead of /usr/include/libxml2. It's throwing some signdness warnings
but it works fine otherwise.

Pedro.

···

On 6/29/06, Pedro Côrte-Real <pedrocr@gmail.com> wrote:

Seems great, but I couldn't install it. The gem threw a bunch of
errors when trying to compile. And it seems it only supports reading
from a file and not an IO or string. I'll have to look at the code.

I was able to use it for a while but it seems it suffers from race
conditions. I got this today:

./imports/../config/../app/models/xmldoc.rb:20: [BUG] rb_gc_mark():
unknown data type 0x38(0x8bbb4c8) non object
ruby 1.8.4 (2005-12-24) [i486-linux]

The mailing list seems dead as well. I'm going to try to use teius.

Pedro.

···

On 6/29/06, Pedro Côrte-Real <pedrocr@gmail.com> wrote:

On 6/29/06, Mark Van Holstyn <mvette13@gmail.com> wrote:
> My co-workers and I recently converted a bunch of rexml code to libxml. The
> speed increase was dramatic ( 100-1000 times faster ). We have not run into
> any stability issues. We use libxml to read, search, delete/change nodes and
> values, and write out new files, all with no issues. What kind of issues are
> you hitting while using libxml?

I did something as simple as:

parser = XML::Parser.new
parser.string = mydocstring
doc = parser.parse