Xml + ruby


James Britt jamesUNDERBARb@seemyemail.com
I suspect it’s the context. Many (most?)
ECMAscripters are doing web
development and are perhaps mainly focused on the data
and markup;
scripts are just another way to manipulate
pointy-bracketed data, not
manage complete business processes. So adding
markup-centric syntax
sort of makes sense.
-----------<<

My interest for it is for web stuff for sure and maybe
that has led to the disconnect here. Good point.

not manage complete business processes<<

ecmascript is a class oriented general purpose
scripting language just like Ruby and it is used (a
lot!) supporting the entire business cycle - just like
you would with java or language x or ruby. It is just
not compiled. Witness the use of ECMAScript (as
opposed to vbscript) with ASP/IIS. We used it in a
large cororation for moderately large applications
with great success because of it’s simplicity and
class orientation, which allowed us to protect our
code. It is ‘missing’ cool things like interpolation
and iterators - but it is quite sound, imo, and not
some ‘scripty’ thing like you suggest. Much of m$'s
site is written with this - some of it even works ;).
Ecmascript if my favorite language on tues,thurs.
Knock it if you want ;/

As for the conflict with characters in ruby names,
unfortunately I can suggest no obvious solution -
maybe use xpath where the easy way is not acceptable.
Also someone mentions ‘just getting to the element he
wants’ without using a long path. Again, xpath, would
be the direct/best approach perhaps.

:pv

···

Do you Yahoo!?
The New Yahoo! Shopping - with improved product search

I think it’s more than that. There’s a deep division in the XML camp
itself as to whether things should be expressed as attributes or
child elements:

baz

The moment you have to deal with an attribute, the x.y notation
becomes nearly useless. How do you deal with attributes in that
form[1]? The e4x stuff as you’ve presented it here is also very
DOM-oriented, which is fine for smaller XML documents, but assuredly
not fine for many other documents – unless you plan on doing lazy
loading.

There’s further confusion if you had:

123

How would I refer to each individual item: list.item[0] through
list.item[2]? If the tag also allowed for CDATA values,
there’s nothing in the XML spec that would prevent:

cdata1cdata2cdata

What you’ve suggested offers no easy way around this. You’re far
better off not treating XML as pseudo-objects. It’s a poor fit,
ultimately, and the moment you get anything reasonably complex (like
RSS) or having namespace support, or you need a larger document that
won’t fit into memory entirely as DOM requires, it breaks and you
need the more complex API again.

-austin
[1] I have not looked at the e4x stufff, and I don’t intend to; I
have too much else on my plate right now. I’m asking practical
problems regarding the suggestion made here.

···

On Fri, 3 Oct 2003 16:11:46 +0900, paul vudmaska wrote:

As for the conflict with characters in ruby names, unfortunately I
can suggest no obvious solution - maybe use xpath where the easy
way is not acceptable. Also someone mentions ‘just getting to the
element he wants’ without using a long path. Again, xpath, would
be the direct/best approach perhaps.


austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2003.10.03
* 08.50.00

paul vudmaska wrote:


James Britt jamesUNDERBARb@seemyemail.com

not manage complete business processes<<

ecmascript is a class oriented general purpose
scripting language just like Ruby and it is used (a
lot!) supporting the entire business cycle - just like
you would with java or language x or ruby. It is just
not compiled. Witness the use of ECMAScript (as
opposed to vbscript) with ASP/IIS. We used it in a
large cororation for moderately large applications
with great success because of it’s simplicity and
class orientation, which allowed us to protect our
code. It is ‘missing’ cool things like interpolation
and iterators - but it is quite sound, imo, and not
some ‘scripty’ thing like you suggest. Much of m$'s
site is written with this - some of it even works ;).
Ecmascript if my favorite language on tues,thurs.
Knock it if you want ;/

I’m not knocking it; I’m aware of it’s features, have used it quite a
bit, and I like it a lot.

http://www.jamesbritt.com/index.rb/2003/Sept/23#The_Kewlness_of_JavaScript

But my impression is that, despite Ecmascript’s qualities as a language,
it has been surplanted by JSP/Servlets/Java and marginalized (however
unfairly) as a “mere” Web scripting lanaguage. I can’t remember the
last time I saw a job ad mentioning server-side javascript.

As with Ruby and other langauges, once some people have relegated a
language to the scripting ghetto it is hard to get them to see it any
differently.

James Britt

I think it’s more than that. There’s a deep division in the XML camp
itself as to whether things should be expressed as attributes or
child elements:

baz

The moment you have to deal with an attribute, the x.y notation
becomes nearly useless. How do you deal with attributes in that
form[1]?

You deal with by accessing foo.attributes[“bar”], which will equal baz.
You shouldn’t access childNodes by saying “foo.bar” This can lead to
ambiguity in many cases. Instead each XML object should support something
like:

If you want to write baz then write something like:

foo.childNodes[0] = bar xml object
foo.childNodes[0].nodeValue = “baz”

There’s further confusion if you had:

123

How would I refer to each individual item: list.item[0] through
list.item[2]?

The structure for the XML object should look something like:

list.childNodes[0].nodeValue = “1”
list.childNodes[1].nodeValue = “2”
list.childNodes[2].nodeValue = “3”

For Ruby to have a well structured and defined( and accessible )
XML object is not that far out there. I have a feeling that
this XML object may improve from a little bit of XP
style thinking. We( or whomever ends up writing it ) should start
with a basic XML object that can be expanded upon to add
stylesheets, CDATA tags, etc…etc…People only need to use
the equivalent of what they have now and a little bit more. And
since the XML we have know lacks alot it and alot the ease it shouldn’t
be this hard to come out with a new or revised XML Object.

-Zach

···

From: Austin Ziegler Wrote:

Ruby ↔ XML seems to be a poor fit, it seems XML will quickly
become too verbose with all the exceptions and having to resort to
XPath (I’ve seen several posts that say “oh, we’ll have to use XPath for
that”, and that wouldn’t make me very happy). Ruby ↔ RDF would
be a better fit, especially if used as a serialization format.

···

Austin Ziegler (austin@halostatue.ca) wrote:

There’s further confusion if you had:

123

How would I refer to each individual item: list.item[0] through
list.item[2]? If the tag also allowed for CDATA values,
there’s nothing in the XML spec that would prevent:

cdata1cdata2cdata


Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Zach Dennis wrote:

For Ruby to have a well structured and defined( and accessible )
XML object is not that far out there.

True enough; in fact, you need only do this:

require ‘rexml/document’

James

I’m not sure I understand what you’re saying, especially as RDF is an XML
format.

I’ve had good results with the use of REXML for XML processing. I don’t do a
lot with XML, but I can’t fault the API for that. REXML can, of course, be
used in either “DOM” or stream/SAX mode, and if the DOM mode is too verbose
for the purposes of processing, then the SAX mode can help – I think.

-austin

···

On Sat, 4 Oct 2003 00:17:01 +0900, Eric Hodel wrote:

Austin Ziegler (austin@halostatue.ca) wrote:

There’s further confusion if you had:
123

How would I refer to each individual item: list.item[0] through
list.item[2]? If the tag also allowed for CDATA values, there’s
nothing in the XML spec that would prevent:

cdata1cdata2cdata
Ruby ↔ XML seems to be a poor fit, it seems XML will quickly become
too verbose with all the exceptions and having to resort to XPath (I’ve
seen several posts that say “oh, we’ll have to use XPath for that”, and
that wouldn’t make me very happy). Ruby ↔ RDF would be a better fit,
especially if used as a serialization format.


austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2003.10.03
* 15.07.29

James Britt wrote:

Zach Dennis wrote:

For Ruby to have a well structured and defined( and accessible )
XML object is not that far out there.

True enough; in fact, you need only do this:

require ‘rexml/document’

If REXML doesn’t float your boat, this article may offer food for
thought (though I’m thinking this whole thread might be better off on
the rexml mailing list …)

http://www.sys-con.com/xml/article.cfm?id=727

James

James,

Are the people on this thread suffering from a case
of programmatic idealism, underuse or misuse of rexml
or is rexml lacking some of the things mentioned on
this thread?

I’ve been humming a pretty tune, but in all honesty
I have hardly used rexml as of yet, since this is
my first weeks arrival into Ruby. So please point
me and others in the right direction, then maybe this
thread can be redirected to point to what maybe
rexml is lacking and how to better improve on it.

-Zach

···

-----Original Message-----
From: James Britt [mailto:jamesUNDERBARb@seemyemail.com]
Sent: Friday, October 03, 2003 1:45 PM
To: ruby-talk ML
Subject: Re: xml + ruby

Zach Dennis wrote:

For Ruby to have a well structured and defined( and accessible )
XML object is not that far out there.

True enough; in fact, you need only do this:

require ‘rexml/document’

James

Ruby ↔ XML seems to be a poor fit, it seems XML will quickly become
too verbose with all the exceptions and having to resort to XPath (I’ve
seen several posts that say “oh, we’ll have to use XPath for that”, and
that wouldn’t make me very happy). Ruby ↔ RDF would be a better fit,
especially if used as a serialization format.

I’m not sure I understand what you’re saying, especially as RDF is an XML
format.

The biggest problem I’ve seen above is that XML documents are just being
made up without any schemas or relationships to other documents. The
best thing about XML is the namespaces, because you can write clothing:tie/
and rope:tie/ and know that the two are different contexts. (Even
though nobody seems to do that, which is a real shame.)

Unfortunately, there’s no way to tell what way two different namespaces
are related to each other, and this is where RDF comes in.

RDF has an XML serialization format, but you can store RDF in any format
you want. RDF’s strength is its focus on relationships, and what you
can infer from the ones given. Here’s some RDF in subject, predicate,
object triples:

urn:person:Sandra → name → Sandra
urn:person:Sandra → gender → female
urn:person:Sandra → sibling → urn:person:Kevin
urn:person:Kevin → name → Kevin
urn:person:Kevin → gender → male
urn:person:Kevin → parent → urn:person:Sarah
urn:person:Sarah → name → Sarah
urn:person:Sarah → gender → female

With this data, you can easily infer that:

Sandra has a brother.
Sandra’s mother’s name is Sarah.
Sarah has at least two children.

And since everything can be given a namespace that maps back to the
basic RDF types, you can easily pass this information in and out of
other systems, and everybody will know what you are talking about.

Specifically, serializing Ruby objects in XML is a waste if only Ruby
can understand the generated XML. Nobody else can understand the
relationships and easily use them into other systems. Using RDF, the
relationships are easily parseable for use in other systems, making
them interoperable without pain.

I’ve had good results with the use of REXML for XML processing. I don’t do a
lot with XML, but I can’t fault the API for that. REXML can, of course, be
used in either “DOM” or stream/SAX mode, and if the DOM mode is too verbose
for the purposes of processing, then the SAX mode can help – I think.

Yes, I agree everything I’ve done with REXML has been straightforward
and simple. I’m just pointing out that throwing XML into the midst of
things isn’t going to necessarily give you any direct gains without a
bit of thought.

···

Austin Ziegler (austin@halostatue.ca) wrote:

On Sat, 4 Oct 2003 00:17:01 +0900, Eric Hodel wrote:


Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Zach Dennis wrote:

James,

Are the people on this thread suffering from a case
of programmatic idealism, underuse or misuse of rexml
or is rexml lacking some of the things mentioned on
this thread?

I have no idea from what anybody might be suffering, idealistic or
otherwise. I believe, though, that many of the topics discussed in this
thread have been discussed before, either on ruby-talk or the rexml list
or on the RubyGarden XmlInRuby page.

This by no means precludes raising them again, but any participants
should at least be familiar with earlier threads.

I’ve been humming a pretty tune, but in all honesty
I have hardly used rexml as of yet, since this is
my first weeks arrival into Ruby. So please point
me and others in the right direction, then maybe this
thread can be redirected to point to what maybe
rexml is lacking and how to better improve on it.

What, if anything, rexml lacks is a matter of opinion. And that opinion
will be colored by one’s needs and how much time one has spent with
Ruby, because Rexml might be best described as a Ruby Way XML API.

Faulting Rexml may be misplaced as well. There well may be a need for
other XML APIS in Ruby, but adding them to either Rexml or the core
language is a whole other matter.

My impression is that a majority of folks on this list will use XML
under certain circumstances, but view it as a means to an end. They do
not care much for the W3C XML DOM, nor view XPath as an intrinsically
appealing data-access API. There is a preference to be able to
manipulate XML source data using a Ruby interface. For most people
Rexml fits the bill quite well.

Whether this is true for you or anyone else requires you spend a bit of
time hacking around with it. If you approach it with a W3C DOM mindset
you may be disappointed (though, then again, you be pleasantly
surprised). If you approach it as just another Ruby API you may find it
quite natural.

If you then decide that Rexml is still lacking in some way, join the
Rexml mailing list and offer some constructive criticism. It’s been kind
of slow there, so new blood is quite welcome. On the other hand, if,
after working with Rexml and Ruby for a bit, you still think the core
language lacks something, bring it up here.

Incidently, Rexml is not the only XML lib for Ruby. While I’ve sadly let
the site decay, www.rubyxml.com still has a decent list of Ruby XML
information. (And all this might be enough to motivate me to extend the
day to 26 hours and fix up the site.)

James

···

-Zach

My response was based solely on an API suggested by Paul Vudmaska in a
previous thread. I have used REXML in “DOM-mode” and been pleasantly
surprised with how easy it is to use; I haven’t yet wrapped my head around
SAX (stream) mode, so I can’t say how easy that is to use.

REXML pretty much does what you suggested it should, and I think that it’s a
very good API. What was being asked for is the automatic transformation of
an XML document to a dotted-notation “object” hierarchy.

Feasible, but it only works for DOM documents that don’t contain attributes
or the possibility of interleaving text nodes.

-austin

···

On Sat, 4 Oct 2003 03:03:55 +0900, Zach Dennis wrote:

Are the people on this thread suffering from a case of programmatic
idealism, underuse or misuse of rexml or is rexml lacking some of the
things mentioned on this thread?


austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2003.10.03
* 15.03.32

…<<

Hi All,

This is my final comment here on rexml. Thanks for
your interesting comments.

I’d probably fall into the ‘programmatic idealistic’
side. I wont boast more than a month’s or so
experience with Ruby so i probably should not have
started the thread.

I like REXML a lot, it feels as close to the language
as any xml api i’ve used.(i’ve used xml a lot since
2000 - and when i started i had my reservations) Then
i ran accross e4x and thot, the fellas here would be
interested and i believe ruby has a better chance of
implementing it elegantly. Period. I’m really not
experienced, educated enough to comment persuasively
on the implementation, or with any real insight, as
you’ve so adroitly pointed out. I should have prefaced
my original comment with that, looking back.

things i know.

  1. xml is increasingly becoming fundamental (might be
    thot of as a common type in my mind).
  2. it is very important for my development - from the
    design(requirements) to the implementation(config
    files - cached state). And many others(cocoon,xsp…)
  3. REXML is the best xml API i’ve found
  4. REXML works just fine as it is
  5. My comments were not meant to be derisive,
    condesending or contentious.

things i think i know

  1. e4x wont be the only language that attempts to fold

xml into it natively. It is in the evolutionary path
of any general purpose language, imo
2) tho this thread has been futile (mostly my own
doing), there are knowledgable folks considering the
implementation very seriously
3) when it is fluidly embedded in the language you
will wonder what you did without it
4) Ruby and REXML will be better at that time

bye,paul vudmaska

···

— Zach Dennis zdennis@mktec.com wrote:

James,

Are the people on this thread suffering from a case
of programmatic idealism, underuse or misuse of
rexml
or is rexml lacking some of the things mentioned on
this thread?


Do you Yahoo!?
The New Yahoo! Shopping - with improved product search

I’d probably fall into the ‘programmatic idealistic’
side. I wont boast more than a month’s or so
experience with Ruby so i probably should not have
started the thread.

Heya, thanks for the thread. I think your willingness to jump into
conversation on the list despite your newness to Ruby is really cool. Sure,
there’s dissenting opinions. You have your own vision for how you’d like to
use the language and what the potential future for the language could be.
I’m sure continued thought will yield good things.

Don’t regret the discussion. Some say there’s too much banter and volume on
the list, but at the same time we measure Ruby’s success by the volume on the
list.

  1. e4x wont be the only language that attempts to fold
    xml into it natively. It is in the evolutionary path
    of any general purpose language, imo

I think ideas like this could be experimented with outside of core. Meaning:
someone with the can-do spirit checks out Ruby from CVS and hacks away. An
idea like this could be more convincing if available as a set of patches or
alternative interpreter (such as Stackless Python). Sounds similiar to e4x
already, eh?

It’d be great if everyone in our community accepted everyone and every idea
that was presented (a wealth of endless backslapping that began to take its
toll on our shoulder blades), but I think chipping away at an idea will
enhance it. Anyways, the idea has merit and I’d love to see some working
ideas that ensure the implosion of my brain.

_why

···

On Friday 03 October 2003 02:20 pm, paul vudmaska wrote:

I’d probably fall into the ‘programmatic idealistic’ side. I wont
boast more than a month’s or so experience with Ruby so i
probably should not have started the thread.
Heya, thanks for the thread. I think your willingness to jump into
conversation on the list despite your newness to Ruby is really
cool. Sure, there’s dissenting opinions. You have your own vision
for how you’d like to use the language and what the potential
future for the language could be. I’m sure continued thought will
yield good things.

I concur. I know that one of my first posts (if not my first post)
to ruby-talk was also a discussion about language features
[ruby-talk:42410], specifically returning the RHS of a test because
everything except false and nil is true. There are some times when I
still think that it’s a good idea (nothing is equal to nil,
including nil), but given that others (and myself, sometimes) want
to see #nil_or_empty?, it’s not something I’d push for at this
point.

Don’t regret the discussion. Some say there’s too much banter and
volume on the list, but at the same time we measure Ruby’s success
by the volume on the list.

Volume? What volume? (Okay, so it’s around 3,000 messages monthly.)

  1. e4x wont be the only language that attempts to fold xml into
    it natively. It is in the evolutionary path of any general
    purpose language, imo
    I think ideas like this could be experimented with outside of core.
    Meaning: someone with the can-do spirit checks out Ruby from CVS and
    hacks away. An idea like this could be more convincing if available as a
    set of patches or alternative interpreter (such as Stackless Python).
    Sounds similiar to e4x already, eh?

I think that the confusion, here, is what is meant by “fold XML into
it natively.” That’s really why I posted my longer analysis of where
something like e4x is going to turn problematic. I mean … XSLT
is XML and it’s got problems with the complexity of possible XML
documents. There are good arguments for permitting attributes, but
if the XML committee had gotten rid of attributes, that would have
significantly simplified what can be programmed around them, because
then XML documents would be purely hierarchical in nature and it
would be trivial to extend languages to support this sort of
behaviour.

It’d be great if everyone in our community accepted everyone and
every idea that was presented (a wealth of endless backslapping
that began to take its toll on our shoulder blades), but I think
chipping away at an idea will enhance it. Anyways, the idea has
merit and I’d love to see some working ideas that ensure the
implosion of my brain.

I’m not sure that Paul’s suggestion is something that will actually
be useful as a “core feature” in Ruby, in the end, because of the
complexity of XML, but it may help refine the REXML interface (or
provide yet another), and I would agree with James Britt that that
might be a better place to discuss this.

-austin

···

On Sat, 4 Oct 2003 06:09:23 +0900, why the lucky stiff wrote:

On Friday 03 October 2003 02:20 pm, paul vudmaska wrote:

austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2003.10.03
* 17.47.49

[snip]

In the spirit of chipping away at an idea, I can’t stand XML. However,
I’m glad Ruby has REXML so that one can write Ruby programs to deal
with external resources that insist on using XML. It’s my belief that
XML has wasted and continues to waste the time of way too many talented
people who’s efforts could have been more productively used in other
endeavors. That’s my semi-contrarian $0.02 (it would be truly
contrarian if many others hadn’t already said it).

Regards,

Mark

···

On Friday, October 3, 2003, at 05:09 PM, why the lucky stiff wrote:

[snip]

It’d be great if everyone in our community accepted everyone and every
idea
that was presented (a wealth of endless backslapping that began to
take its
toll on our shoulder blades), but I think chipping away at an idea will
enhance it.

One of (IMO) the classic rants on the subect:

http://groups.google.com/groups?selm=<3250033069468718%40naggum.no>

martin

···

Mark Wilson mwilson13@cox.net wrote:

In the spirit of chipping away at an idea, I can’t stand XML. However,
I’m glad Ruby has REXML so that one can write Ruby programs to deal
with external resources that insist on using XML. It’s my belief that
XML has wasted and continues to waste the time of way too many talented
people who’s efforts could have been more productively used in other
endeavors. That’s my semi-contrarian $0.02 (it would be truly
contrarian if many others hadn’t already said it).

Martin DeMello wrote:

One of (IMO) the classic rants on the subect:

http://groups.google.com/groups?selm=<3250033069468718%40naggum.no>

Yow. “Rant” is quite the understatement.

But I know Sean’s looking for a good flame war on the REXML list …

:slight_smile:

James

Sam Ruby[0] has some brief notes about Ruby and REXML at
http://www.intertwingly.net/blog/1604.html
and
http://www.intertwingly.net/blog/1605.html

with assorted reader comments

James

[0] He who prompts me to add " -Sam" when Googling for Ruby

Hey.

Thought I’d drop in on this discussion. There are several threads in
the newsgroup on the topic of the least intrusive API for XML in Ruby.

What I’m understanding is that there are people who want to hide the
XML details of XML whilst in Ruby. This sounds a lot to me like
serialization, and that’s a layer above what XML packages provide. A
serialization package, with minimal intrusion, could provide some
support for namespaces and attributes, and would look a lot like what
the minimalists (as in minimally intrusive) are asking for.

Users of XML can generally be divided into two broad camps. There are
those who have some data, and they more or less want or need it to be
XML at some point. On the other side are people who are dealing with
the XML without being too concerned with the content. For those in
the first camp, serialization is a great solution. Those in the
second camp need more control over the data, and a specialized API is
more appropriate. If you’ve contemplated using YAML instead of XML,
your probably in the first camp. A common reason for being in the
second camp is that you’re getting your data from somewhere else.

In my experience, an XML API can be abstracted only so much before you
begin to loose control over the finer details. High level APIs are
great for simple documents, but begins to break down when one
introduces comments, processing instructions, entities, and mixed
content. I’d go a step further and suggest that any sufficiently
abstracted API that entirely hides the XML details of an XML document
will be insufficient to handle all possible legal XML documents.

All that means, though, is that an API that high-level is insufficient
as the only API available for dealing with XML. What that means to me
is that the high-level API should sit on top of another API that
provides finer control. It doesn’t mean that the high level API isn’t
useful or shouldn’t be written.

By the way, I did try to write a transparent API for REXML a couple of
months ago; it looked something like this:

a = Node.new
a << "B"                # => <a>B</a>
a.b                     # => <a>B<b/></a>
a.b[1]                  # => <a>B<b/><b/><a>
a.b[1]["x"] = "y"       # => <a>B<b/><b x="y"/></a>
a.b[0].c                # => <a>B<b><c/></b><b x="y"/></a>
a.b.c << "D"            # => <a>B<b><c>D</c></b><b x="y"/></a>

I didn’t get very far with it; it seemed like terrible hacks were
needed to implement it, and I’m not sure I want to maintain that code,
but if there’s enough interest, I might revive it.

In the opposite end of the spectrum is an API that is heavily tied to
XML technology, like XPath:

a = Tree.new( "/a/b[2][ @x = 'y' ]" )  # <a><b/><b x="y"/></a>
a[ "/a/text()" ] = "B"                 # <a>B<b/><b x="y"/></a>
c = a[ "/a/b/c" ]             # <a>B<b><c/></b><b x="y"/></a>
c[ "text()" ] = "D"           # <a>B<b><c>D</c></b><b x="y"/></a>

Not so nice for constructing documents, but almost peerless for
accessing nodes.

Of course, there are other, more pressing, issues that don’t let me
play too much with this stuff; things like validation, bug fixes,
optimizations, XPath support in the streaming APIs, a good lightweight
API… any number of things.

Anyway, diversity is good.