REXML namespace support

In my case I’m given a string which is a namespace prefix and I want to
determine the namespace it is associated with. I don’t have an Element
object so I can’t use the namespace method.

···

-----Original Message-----
From: Kouhei Sutou [mailto:kou@cneti.net]
Sent: Friday, September 27, 2002 9:33 AM
To: ruby-talk@ruby-lang.org
Subject: Re: REXML namespace support

From: “Volkmann, Mark” Mark.Volkmann@AGEDWARDS.com
Subject: REXML namespace support
Date: Fri, 27 Sep 2002 22:45:45 +0900
Message-ID:
89539780CB9BD51182270002A5897DF602C6D670@hqempn04.agedwards.com

Is there an easy way in REXML to get the namespace that is
associated with a
given prefix?

You can use REXML::Element#namespace if you are using a-la DOM API.

----------Example-----------------
require ‘rexml/document’

xml = <<XML

<?xml version='1.0'?> XML

doc = REXML::Document.new(xml)

p doc.root.namespace(“fuga”)
p doc.elements[“/a/b”].namespace(“fuga”)
p doc.elements[“/a/b/c”].namespace(“fuga”)
p doc.elements[“/a/d”].namespace(“fuga”)
----------Example-----------------

----------Output------------------
“foo”
“foo”
“hoge”
“bar”
----------Output------------------


WARNING: All e-mail sent to and from this address will be received or
otherwise recorded by the A.G. Edwards corporate e-mail system and is
subject to archival, monitoring or review by, and/or disclosure to,
someone other than the recipient.


But you have a Document instance, don’t you?

I was thinking you might be able to do something with XPath, e.g.

XPath.match(doc,"namespace-uri(//#{prefix}:*[1])")

that doesn’t work, though. I’m not sure if namespace-uri() is
implemented in REXML or not. And I think in order to use a namespace
prefix in XPath, you may have to define it in the context where it is
being evaluated. I know you have to do that in XSLT; I’m not sure what
the XPath spec requires for stand-alone XPath processing. Oh, well, so
much for that idea. You may have to just walk the document tree until
you find a match.

···

On Sat, Sep 28, 2002 at 12:18:28AM +0900, Volkmann, Mark wrote:

In my case I’m given a string which is a namespace prefix and I want to
determine the namespace it is associated with. I don’t have an Element
object so I can’t use the namespace method.


Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com

Try this:

(REXML::XPath.match(doc,’//|//@’).detect
{|node| node.prefixes.include? ‘foo’}).namespace(‘foo’)

···


Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com
http://www.havenrock.com/

XML namespaces are defined as xml is processed. The prefix may be reused.
For example:

is legal (though a questionable practice, and ignoring opinions w.r.t.
Namespaces being URLs). In this example there are two namespaces but only
one prefix is used. In general, the prefix mapping to namespace is scoped.
This makes it impossible to answer the question you want to ask unless you
can identify a spot in the document.

I know that the parser I wrote for ruby will supply you all in-scope
prefix-namespace mappings as it parses an XML file. I assume that REXML will
do the same, though I don’t know for sure.

It occurs to me that it is possible that you miss-understand the requirement
on having an element that somebody else mentioned – any element at all will
do (or processing instruction, or comment) when parsing, regardless of
its namespace or prefix, and all in-scope prefixes and namespaces will be
available to you.

Cheers,
Bob

···

On 9/27/02 11:18 AM, “Volkmann, Mark” Mark.Volkmann@AGEDWARDS.com wrote:

In my case I’m given a string which is a namespace prefix and I want to
determine the namespace it is associated with. I don’t have an Element object
so I can’t use the namespace method.

Message-ID: 20020927173141.GD1574@swordfish

Try this:

(REXML::XPath.match(doc,‘//|//@’).detect
{|node| node.prefixes.include? ‘foo’}).namespace(‘foo’)

Try this too:

parser = REXML::SAX2Parser.new(xml)
parser.listen(:start_prefix_mapping) do |prefix, uri|
p uri if prefix == “foo”
end
parser.parse

···

From: Matt Gushee mgushee@havenrock.com
Subject: Re: REXML namespace support
Date: Sat, 28 Sep 2002 02:41:44 +0900

======================
Kouhei Sutou
kou@cneti.net

In my case I’m given a string which is a namespace prefix and I want to
determine the namespace it is associated with. I don’t have an Element object
so I can’t use the namespace method.

XML namespaces are defined as xml is processed. The prefix may be reused.

You mean ‘prefix-to-namespace mappings are defined …’, don’t you? Sorry
to be picky, but XML namespaces are confusing enough without imprecise
explanations.

is legal (though a questionable practice, and ignoring opinions w.r.t.
Namespaces being URLs). In this example there are two namespaces but only
one prefix is used. In general, the prefix mapping to namespace is scoped.
This makes it impossible to answer the question you want to ask unless you
can identify a spot in the document.

I would say ‘unreliable’ rather than ‘impossible’. You’re quite right
that it’s legal to reuse prefixes, but it would be very counterintuitive
to do so, and I doubt that it’s likely to happen in practice; I’ve
certainly never seen it (except in examples contrived to show that it
can be done). Have you?

Which means that, if a certain degree of doubt is acceptable (e.g. doing
a first-pass analysis of a large collection of XML documents, with
thorough testing to take place later), there shouldn’t be a problem with
just pulling out the first available mapping (though I don’t think I’d
choose that approach myself unless I were under extreme time pressure).

It might be interesting to hear more about the problem the original
poster was trying to solve. But he seems to have either given up, or
been satisfied with the answers he got and wandered off without
bothering to say “thanks.” Oh, well. That’s mailing-list life.

I know that the parser I wrote for ruby will supply you all in-scope
prefix-namespace mappings as it parses an XML file. I assume that REXML will
do the same, though I don’t know for sure.

Yes, it will.

···

On Mon, Sep 30, 2002 at 11:08:15PM +0900, Bob Hutchison wrote:

On 9/27/02 11:18 AM, “Volkmann, Mark” Mark.Volkmann@AGEDWARDS.com wrote:


Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com

In my case I’m given a string which is a namespace prefix and I want to
determine the namespace it is associated with. I don’t have an Element
object
so I can’t use the namespace method.

XML namespaces are defined as xml is processed. The prefix may be reused.

You mean ‘prefix-to-namespace mappings are defined …’, don’t you? Sorry
to be picky, but XML namespaces are confusing enough without imprecise
explanations.

I guess I meant both.

I should also have said ‘namespace2’ in element2 in my example.

You are right, it is confusing enough (and it doesn’t matter how precise you
are, it is still confusing)

is legal (though a questionable practice, and ignoring opinions w.r.t.
Namespaces being URLs). In this example there are two namespaces but only
one prefix is used. In general, the prefix mapping to namespace is scoped.
This makes it impossible to answer the question you want to ask unless you
can identify a spot in the document.

I would say ‘unreliable’ rather than ‘impossible’. You’re quite right
that it’s legal to reuse prefixes, but it would be very counterintuitive
to do so, and I doubt that it’s likely to happen in practice; I’ve
certainly never seen it (except in examples contrived to show that it
can be done). Have you?

Yes, actually, I have. I certainly don’t recommend the practice, but I’ve
seen it. If you are writing tools for XML you have to be prepared for it.

Which means that, if a certain degree of doubt is acceptable (e.g. doing
a first-pass analysis of a large collection of XML documents, with
thorough testing to take place later), there shouldn’t be a problem with
just pulling out the first available mapping (though I don’t think I’d
choose that approach myself unless I were under extreme time pressure).

I wouldn’t recommend it.

···

On 9/30/02 1:20 PM, “Matt Gushee” mgushee@havenrock.com wrote:

On Mon, Sep 30, 2002 at 11:08:15PM +0900, Bob Hutchison wrote:

On 9/27/02 11:18 AM, “Volkmann, Mark” Mark.Volkmann@AGEDWARDS.com wrote:

You mean ‘prefix-to-namespace mappings are defined …’, don’t you? Sorry
to be picky, but XML namespaces are confusing enough without imprecise
explanations.

I guess I meant both.

I should also have said ‘namespace2’ in element2 in my example.

You are right, it is confusing enough (and it doesn’t matter how precise you
are, it is still confusing)

True. I used to teach an intro-to-XML course. The students were by no
means dumb people, but somehow the Namespaces section always managed to
go overtime.

is legal (though a questionable practice, and ignoring opinions w.r.t.
Namespaces being URLs). In this example there are two namespaces but only
one prefix is used. In general, the prefix mapping to namespace is scoped.
This makes it impossible to answer the question you want to ask unless you
can identify a spot in the document.

I would say ‘unreliable’ rather than ‘impossible’. You’re quite right
that it’s legal to reuse prefixes, but it would be very counterintuitive
to do so, and I doubt that it’s likely to happen in practice; I’ve
certainly never seen it (except in examples contrived to show that it
can be done). Have you?

Yes, actually, I have. I certainly don’t recommend the practice, but I’ve
seen it. If you are writing tools for XML you have to be prepared for it.

Wow. Okay. Just out of curiousity, are you talking about documents
written by human authors, or generated XML, or both? It’s hard to
understand why anyone would deliberately do that.

Though, now that I think about it, it would be pretty easy to get reused
prefixes in a composite document, if you didn’t specifically provide a
means to uniquify the prefixes when assembling the components.

Which means that, if a certain degree of doubt is acceptable (e.g. doing
a first-pass analysis of a large collection of XML documents, with
thorough testing to take place later), there shouldn’t be a problem with
just pulling out the first available mapping (though I don’t think I’d
choose that approach myself unless I were under extreme time pressure).

I wouldn’t recommend it.

Nor would I. I guess I’m just resigned to living in a world full of
kludges.

···

On Tue, Oct 01, 2002 at 04:28:43AM +0900, Bob Hutchison wrote:


Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com

[… discussion about XML documents with the same prefix referring to
different namespaces (in different parts of the document) …]

Yes, actually, I have. I certainly don’t recommend the practice, but I’ve
seen it. If you are writing tools for XML you have to be prepared for it.

Wow. Okay. Just out of curiousity, are you talking about documents
written by human authors, or generated XML, or both? It’s hard to
understand why anyone would deliberately do that.

I think it must have been cut and paste (or confusion??) that caused the
human authored ones. I’ve seen documents generated by composition (as you
mentioned) end up with it too. (Composition was a big thing where I was
working a year or so ago).

There are different ways of doing this. One where a prefix is re-defined
(like in my example) and another where the prefix is used in different parts
of the document.

Though, now that I think about it, it would be pretty easy to get reused
prefixes in a composite document, if you didn’t specifically provide a
means to uniquify the prefixes when assembling the components.

Which means that, if a certain degree of doubt is acceptable (e.g. doing
a first-pass analysis of a large collection of XML documents, with
thorough testing to take place later), there shouldn’t be a problem with
just pulling out the first available mapping (though I don’t think I’d
choose that approach myself unless I were under extreme time pressure).

I wouldn’t recommend it.

Nor would I. I guess I’m just resigned to living in a world full of
kludges.

:slight_smile:

···

On 9/30/02 3:47 PM, “Matt Gushee” mgushee@havenrock.com wrote:

On Tue, Oct 01, 2002 at 04:28:43AM +0900, Bob Hutchison wrote:

“Matt Gushee” mgushee@havenrock.com wrote in message
news:20020930193709.GA1749@swordfish…

is legal (though a questionable practice, and ignoring opinions
w.r.t.
Namespaces being URLs). In this example there are two namespaces but
only
one prefix is used. In general, the prefix mapping to namespace is
scoped.
This makes it impossible to answer the question you want to ask
unless you
can identify a spot in the document.

I would say ‘unreliable’ rather than ‘impossible’. You’re quite right

What about scoped access:
ns_of_fullname(“N”) yields prefix for outermost scope
ns_of_fullname(“N.N”) yields namespace string of nested scope.
ns_of_name(“N”) yields best matching, preferring least nested scope in case
of a conflict.

Mikkel

Maybe I’m dense, but I don’t follow this at all. Are you saying these
methods exist somewhere? I haven’t seen them in the docs for Ruby or
REXML, or in the XML Namespaces spec*. Or are you just asking what-if?
Are you sure you’re talking about XML Namespaces as defined by the W3C?
If so, you’re using terminology that I’ve never heard in that context.

  • And contrary to what I said earlier, I probably have read the whole thing
    –I just looked at it and was reminded that it’s quite short. I’ve
    become used to W3C specs being enormous and convoluted.
···

On Tue, Oct 01, 2002 at 07:38:24AM +0900, MikkelFJ wrote:

“Matt Gushee” mgushee@havenrock.com wrote in message
news:20020930193709.GA1749@swordfish…

is legal (though a questionable practice, and ignoring opinions
w.r.t.
Namespaces being URLs). In this example there are two namespaces but
only
one prefix is used. In general, the prefix mapping to namespace is
scoped.
This makes it impossible to answer the question you want to ask
unless you
can identify a spot in the document.

I would say ‘unreliable’ rather than ‘impossible’. You’re quite right

What about scoped access:
ns_of_fullname(“N”) yields prefix for outermost scope
ns_of_fullname(“N.N”) yields namespace string of nested scope.
ns_of_name(“N”) yields best matching, preferring least nested scope in case
of a conflict.


Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com

What about scoped access:
ns_of_fullname(“N”) yields prefix for outermost scope
ns_of_fullname(“N.N”) yields namespace string of nested scope.
ns_of_name(“N”) yields best matching, preferring least nested scope in
case
of a conflict.

Maybe I’m dense, but I don’t follow this at all. Are you saying these
methods exist somewhere? I haven’t seen them in the docs for Ruby or
REXML, or in the XML Namespaces spec*. Or are you just asking what-if?

I’m saying what-if.

Are you sure you’re talking about XML Namespaces as defined by the W3C?
If so, you’re using terminology that I’ve never heard in that context.

I’m working with hierachical data in other contexts so I recognized the
problem but I’m not deep into XML DOM or REXML naming conventions. I believe
the term ‘name’ and ‘fullname’ is used fairly widely. ns just seemed more
convenient than namespace.

BTW: I’ve alwas found it difficult to accept that XML namespaces are flat -
one namespace cannot belong to another, although the URL’s provide som sort
of hierarchy. Prefixes are scoped, but that is different.

  • And contrary to what I said earlier, I probably have read the whole
    thing
    –I just looked at it and was reminded that it’s quite short. I’ve
    become used to W3C specs being enormous and convoluted.

Yes - I’ve always wondered how they managed :slight_smile:

Mikkel

What about scoped access:
ns_of_fullname(“N”) yields prefix for outermost scope
ns_of_fullname(“N.N”) yields namespace string of nested scope.
ns_of_name(“N”) yields best matching, preferring least nested scope in
case
of a conflict.

Maybe I’m dense, but I don’t follow this at all. Are you saying these
methods exist somewhere? I haven’t seen them in the docs for Ruby or
REXML, or in the XML Namespaces spec*. Or are you just asking what-if?

I’m saying what-if.

Okay, it starts to make a bit more sense then. Unfortunately I don’t
think your idea is quite compatible with how XML namespaces work. The
problem is that at any given scope (i.e. within any given element in the
document hierarchy), there may be several namespaces in scope, or none.
And an unqualified name may either have no namespace or be part of a
default namespace (i.e. one defined with no prefix). So if I have a
document like this:

Blablabla Yadda yadda yadda

the first has no namespace, but the second belongs to the
http://www.sillyxml.com/” namespace. So,

ns_of_fullname(“N”) yields prefix for outermost scope

Nope, not if you apply it to the second .

Are you sure you’re talking about XML Namespaces as defined by the W3C?
If so, you’re using terminology that I’ve never heard in that context.

I’m working with hierachical data in other contexts so I recognized the
problem but I’m not deep into XML DOM or REXML naming conventions. I believe
the term ‘name’ and ‘fullname’ is used fairly widely. ns just seemed more
convenient than namespace.

I guess I was thinking mainly of ‘fullname’ and ‘namespace string’,
which are perfectly understandable but not commonly used in XML circles.

BTW: I’ve alwas found it difficult to accept that XML namespaces are flat -
one namespace cannot belong to another, although the URL’s provide som sort
of hierarchy. Prefixes are scoped, but that is different.

I’m not sure how much of a problem that really is. You can think of the
elements themselves as constituting a kind of namespace hierarchy–and
tools like XPath allow you to easily distinguish between the
elements in the two following examples:

Abraham Lincoln Gettysburg Address ...

and

A Graham Pinken 3939 A St. ...

So you can process the differently based on whether it’s a
‘/leaders/*/address’ or a ‘customer/address’ … or to take a more
likely example, distinguish a ‘book/title’ from a ‘section/title’ or a
‘section/section/section/title’. And, unlike DTDs, XML Schema and RELAX
NG allow you to create context-dependent element definitions.

There are certainly difficult issues with XML Namespaces–especially
related to what, if any, semantics the namespace URI should have, and
whether unqualified attributes should have no namespace while
unqualified attributes may belong to a default namespace. But if you
know how to work with the element hierarchy, I don’t see what great
advantage there would be in having hierarchical namespaces. I imagine
you’re talking about something like:

so that the b:bar element could be referred to as ‘a:b:bar’, and its
namespace URI would resolve to “http://www.example.com/a/b”? Is there
something you could do under this scheme that is difficult or impossible
with flat namespaces? Or is it more a matter of convenience?

Just curious, really. I have been very steeped in the SGML/XML
worldview, and I sense that you are more of a programmer, which is of
course fine, but it entails a slightly different way of thinking.

···

On Tue, Oct 01, 2002 at 08:58:34AM +0900, MikkelFJ wrote:


Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com

“Matt Gushee” mgushee@havenrock.com wrote in message

And an unqualified name may either have no namespace or be part of a
default namespace (i.e. one defined with no prefix). So if I have a
document like this:

Blablabla Yadda yadda yadda

the first has no namespace, but the second belongs to the
http://www.sillyxml.com/” namespace. So,

ns_of_fullname(“N”) yields prefix for outermost scope

Nope, not if you apply it to the second .

I was referring to returning the namespace given a namespace prefix, not the
namespace of an element - but we can extend the discussion to that as well.
Furthermore, the functions operate at a document level, so you are at no
particular place (element) in the DOM when applying the functions - hence
the N.N or X.Y.Z notion.

However, I’d like to add a function similar to ns_name:
ns_of_local_name(element, prefix), which returns the “namespace string” of
the innermost applicable prefix seen from the element. This is the actual
namespace active for that prefix.
When can furthermore include the empty prefix to mean the default and thus
keep things general.

I guess I was thinking mainly of ‘fullname’ and ‘namespace string’,
which are perfectly understandable but not commonly used in XML circles.

Fine - as long as the message gets through - sometimes it helps to be
unconvential when inventing or (or sometimes rediscovering) new things.

BTW: I’ve alwas found it difficult to accept that XML namespaces are
flat -
one namespace cannot belong to another, although the URL’s provide som
sort
of hierarchy. Prefixes are scoped, but that is different.

I’m not sure how much of a problem that really is.

Me neither, I’ve just always assumed they were hierachical so when reading
through the standard (quite a while back), I kept searching for how this was
handled since it wasnt’ stated explicitly that the namespaces were flat. In
programming languages scopes are usually nested, and many things in XML are
also nested.

You can think of the
elements themselves as constituting a kind of namespace hierarchy–and
tools like XPath allow you to easily distinguish between the
elements in the two following examples:

Abraham Lincoln Gettysburg Address ...

and

A Graham Pinken 3939 A St. ...

So you can process the differently based on whether it’s a
‘/leaders/*/address’ or a ‘customer/address’ …

This is true, but it is also mixing too different concepts so you have to
model the two possibly independent kind of data carefully to avoid
conflicts.
I have my organizations global namespace in a url. I can use this url to
create subnamespaces, and indeed this may be sufficient. However, it would
be nice to say this piece of XML follows a standard convention e.g. docbook,
and use docbook namespace. However, I’d still like to keep everything
completely contained within my organization and possible within a version
namespace. It would be more flexible with nested namespaces, but certainly
you can get away without. This could be a solution in search of a problem.
But when using third party software working on long term live XML database
data, it could become quite relevant.

or to take a more
likely example, distinguish a ‘book/title’ from a ‘section/title’ or a
‘section/section/section/title’. And, unlike DTDs, XML Schema and RELAX
NG allow you to create context-dependent element definitions.

These are aspects I’ll have to look further into, however, XML Schema could
never go beyond whats possible in XML, but might easy some
inter-dependencies during document creation.

There are certainly difficult issues with XML Namespaces–especially
related to what, if any, semantics the namespace URI should have, and
whether unqualified attributes should have no namespace while
unqualified attributes may belong to a default namespace. But if you
know how to work with the element hierarchy, I don’t see what great
advantage there would be in having hierarchical namespaces. I imagine
you’re talking about something like:

See argumentation above.

so that the b:bar element could be referred to as ‘a:b:bar’, and its
namespace URI would resolve to “http://www.example.com/a/b”?

Not exactly, I want to decouple the namespace string or URI from the
hierachy. This is actually an important point. However, I never gave it that
much theoretical brainpower, at least not in relation to XML.

Like in Ruby, if you have nested classes, or better modules, each module
name would have its own URI. Now you can a W3 standard URI inside you own
project and have it be different from the same elements in a different
project (although, as you argue, the elements themselves present a scoping).

Is there
something you could do under this scheme that is difficult or impossible
with flat namespaces? Or is it more a matter of convenience?
Just curious, really.

In the end everything is about convenience but this translates into reduced
risc and increased productivity. Anyway I have had no problems with it - it
just puzzled me.

I’ve been experimenting with implementing globally unqiue identifers via
nested namespaces in an application outside of the XML framework but that is
a different story.

You can also look at the ASN.1 object identifier tree which I believe is
based on a hierarchical model.

I have been very steeped in the SGML/XML
worldview, and I sense that you are more of a programmer, which is of
course fine, but it entails a slightly different way of thinking.

Certainly - to me XML is just one way (a good way) to view a more general
problem and in several ways I feel XML has invaded my space of structured
data models. For example, back in 1995 I used the RIFF file format for a
communcation infrastructure and for interpreted language logic before XML
became all the hype (although SGML was around somewhere). The RIFF format
(known from .WAV files) has the essential features of XML - its a binary
hierachical tagged format using 4 byte tags and a 4 byte length field
instead of an endmarker. (And then the RIFF I/O API is better than the
normal C / Unix file API (fopen and friends) as it has optimized buffering
due to multimedia applications).

Mikkel