Ann: rexml 2.3.5 && 2.2.3


(SER) #1

<posted & mailed>

Hello,

These releases contain primarily bug fixes, although there is some new
functionality.

2.3.5 Changelog:

  • Fixed a bug that caused text containing > to be split into two text nodes.
    This incurred a speed penalty, but I’ll try to improve that later.
  • Added a bug tracking system.
  • Fixed a comment parsing bug.
  • Mike Stok fixed Functions#translate and cleaned up some cruft that slipped
    through in Functions#substring.
  • Fixed a bug in Element#prefixes, and fixed Attributes#prefixes to use
    DOCTYPE declared namespaces. Added DocType#attributes_of(Element).
  • Fixed a bug in writing Attlist declarations.
  • Added AttlistDecl#each; AttlistDecl now includes Enumerable
  • Fixed Functions#name and Functions#local_name; fixed unit test.
  • Fixed a bug re. functions w/in predicates in XPath
  • Fixes for Child#parent=()
  • Fixes and speed improvement for creating Text nodes
  • SAX2Parser bug fixes
  • Added dist.xml and an ant build file
  • Tom sent a new version of his pretty printer (in contribs/)
  • Kouhei has a new version of his Japanese API documentation translation
    online

2.2.3 Changelog

  • Fixed a bug in entity handling.
  • Backported a bugfix WRT function calls in predicates, and Function#name()
    and Function#local_name.

It looks like the new XPath is a flop. I might be able to use some
concepts, but what it boils down to is that there just isn’t any way (that
I can see) to simplify predicate handling or axes. This isn’t to say that
I can’t speed up XPath, because it can be optimized. For instance, a lot
of code that is being evaluated with Procs currently can be inlined. This
would give a significant speed increase, and I’ll get around to it
eventually. It’ll make the code more difficult to maintain and bugfix, so
I’m delaying as long as possible.

I’ve also decided to drop the idea of implementing a full DTD parser, for
the moment. DTD is just a monster to parse. I’ll continue to implement
bits and pieces of DTD, but when REXML gets to be validating, it will
probably only be with XMLSchema.

If a fix for your bug didn’t make it into this release, don’t worry – I’ll
get to it soon. I’ve been sitting on 2.3.5 for a while, and wanted to get
get it out.

Thanks!

···

… A computer without Windows is like a fish without a bicycle
<|>
/|\
/|


(Tobi Reif) #2

Sean Russell wrote:

I’ve also decided to drop the idea of implementing a full DTD parser,
for the moment. DTD is just a monster to parse.

Agreed. Especially complex modular ones like DocBook or XHTML 1.1.

I’ll continue to implement bits and pieces of DTD, but when REXML
gets to be validating, it will probably only be with XMLSchema.

… which is a monster of even larger dimensions :wink:
(Perhaps Relax NG would be an option)

Xerces-C, and it tells me if the doc is valid or not in no time.

I think it’s very important to continue working on the perfection of the
parser itself, the tree API, XPath, namespaces (in the API, in XPath,
and in the serializer), speed, and the myriad of nitty gritty details to
solve that is hidden in there already.

Tobi

···

From my POV, validation is the lowest priority; I can just call


http://www.pinkjuice.com/


(Pierre Baillet) #3

Hi All and Sean,

···

On Mon, Jun 10, 2002, Sean Russell wrote:

  • Fixed a bug that caused text containing > to be split into two text nodes.
    This incurred a speed penalty, but I’ll try to improve that later.

I know this is probably naive from me but I thought I was told that “<”,">"
and “&” where the 3 characters that were never to be found in a text
node in xml in their normal form. they should be replaced by the
entities “<” “>” and “&”. How can this bug be a bug then ?


Pierre Baillet
Linux is user friendly. Linux is not idiot friendly
If you don’t understand that, use Windows


(SER) #4

Pierre Baillet wrote:

I know this is probably naive from me but I thought I was told that
"<",">" and “&” where the 3 characters that were never to be found in a
text node in xml in their normal form. they should be replaced by the
entities “<” “>” and “&”. How can this bug be a bug then ?

‘>’ is a legal character – it is allowed, unquoted, in XML. The only ASCII
characters that are required to be quoted in Text nodes are ‘<’ and ‘&’.

http://www.w3.org/TR/2000/REC-xml-20001006#syntax
"The ampersand character (&) and the left angle bracket (<) may appear in
their literal form only when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are needed
elsewhere, they must be escaped using either numeric character references
or the strings “&” and “<” respectively. The right angle bracket (>)
may be represented using the string “>”, and must, for compatibility, be
escaped using “>” or a character reference when it appears in the string
"]]>" in content, when that string is not marking the end of a CDATA
section."

···

… “The Best way to accelerate a Macintosh is at 9.8m/sec/sec”
<|> – anon
/|\
/|


(James) #5

I know this is probably naive from me but I thought I was told
that “<”,">"
and “&” where the 3 characters that were never to be found in a text
node in xml in their normal form. they should be replaced by the
entities “<” “>” and “&”. How can this bug be a bug then ?

“>” is OK, though some (me) find it easier to read if all such markup is
escaped (rather than just the “<”).

James

···


Pierre Baillet
Linux is user friendly. Linux is not idiot friendly
If you don’t understand that, use Windows


(SER) #6

<posted & mailed>

Tobias Reif wrote:

I’ll continue to implement bits and pieces of DTD, but when REXML
gets to be validating, it will probably only be with XMLSchema.

… which is a monster of even larger dimensions :wink:
(Perhaps Relax NG would be an option)

Well, XMLSchema may be troublesome to interpret, but it isn’t difficult to
parse, since it is pure XML. It is also probably going to be the standard
schema mechanism for XML. I’ve mentioned before how I think they went down
the wrong path with XMLSchema – XMLSchema has kitchen-sink-itis, just like
SGML did. Unfortunately, I’m not the person who the standards :slight_smile:

I don’t know much about Relax NG. Is there any support for among common XML
tools? A standard isn’t of any use if nobody uses it. (Is anybody here
aware that the “standard” US Government document standard is an SGML
format? Only, it isn’t commonly used.)

Hmmm… Relax NG… if James Clark chairs it, it’s probably pretty good. I
admit, I’ve been dreading having to deal with XMLSchema (although not as
much as I resent having to deal with DTD). I’ll check it out.

···

… “Govornments are the kinds of organizations which, although they do
<|> small things badly, do large things badly as well.”
/|\
/|


(James) #7

Well, XMLSchema may be troublesome to interpret, but it isn’t
difficult to parse, since it is pure XML. It is also probably going to
be
the standard schema mechanism for XML.

Depends on what is meant by “standard”

While the W3C seems committed to their XML Schema spec, they technically
don’t make set standards, just recommendations. of source, many act as if
they were a standards body, so perhaps the effective result is the same.

Still, their are those who don’t see the W3C as the One True Source, and
recent debates on xml-dev suggest a good chunk of developers will move
towards RELAX-NG (for at least some portion of their work), which is
currently an ISO draft standard.
http://www.xmlhack.com/read.php?item=1667

Unlike the W3C, ISO is an actual standards organization.

James Clark made some interesting comments about the W3C XML Schema and
RELAX-NG here:
http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html

James


(Tobi Reif) #8

james@rubyxml.com wrote:

I know this is probably naive from me but I thought I was told
that “<”,">"
and “&” where the 3 characters that were never to be found in a text
node in xml in their normal form. they should be replaced by the
entities “<” “>” and “&”. How can this bug be a bug then ?

More precisely:
“The ampersand character (&) and the left angle bracket (<) may appear
in their literal form only when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are
needed elsewhere, they must be escaped using either numeric character
references or the strings “&” and “<” respectively.”

“>” is OK, though some (me) find it easier to read if all such markup is
escaped (rather than just the “<”).


The right angle bracket (>) may be represented using the string “>”,
and must, for compatibility, be escaped using “>”

In attributes it’s either “’”, ‘"’, or “To allow attribute values to
contain both single and double quotes, the apostrophe or single-quote
character (’) may be represented as “’”, and the double-quote
character (”) as “”".
"

http://www.w3.org/TR/REC-xml.html#syntax

Tobi

···


http://www.pinkjuice.com/


#9

In article 3d039d03@news.mhogaming.com,

<posted & mailed>

Tobias Reif wrote:

(Perhaps Relax NG would be an option)

I don’t know much about Relax NG. Is there any support for among common XML
tools? A standard isn’t of any use if nobody uses it. (Is anybody here
aware that the “standard” US Government document standard is an SGML
format? Only, it isn’t commonly used.)

Hmmm… Relax NG… if James Clark chairs it, it’s probably pretty good. I
admit, I’ve been dreading having to deal with XMLSchema (although not as
much as I resent having to deal with DTD). I’ll check it out.

    • There aren’t as many tools as for XMLSchema, but there are
      tools at least for Java. It’s an Oasis standard, there are a
      bunch of good links from the Oasis web site.

http://www.oasis-open.org/committees/relax-ng/

From what I can see Relax NG is probably the only real
alternative to XMLSchema, but XMLSchema is definitely
the “market leader” at this point.

    • Booker C. Bense
···

Sean Russell ser@germane-software.com wrote:


(James) #10

“>” is OK, though some (me) find it easier to read if all such
markup is
escaped (rather than just the “<”).


The right angle bracket (>) may be represented using the string “>”,
and must, for compatibility, be escaped using “>”

The complete sentence is

The right angle bracket (>) may be represented using the string “>”, and
must, for compatibility, be escaped using “>” or a character reference
when it appears in the string “]]>” in content, when that string is not
marking the end of a CDATA section.

Bad:
“>” is OK. But “]]>” will cause trouble

Good:
“>” is OK. So is “]]>”

And “for compatibility” is defined as “Marks a sentence describing a
feature of XML included solely to ensure that XML remains compatible with
SGML.”

James


(SER) #11

james@rubyxml.com wrote:

Depends on what is meant by “standard”

In this case, I mean what is likely to be used. The W3C does tend to act
like a standards body, but you can’t really blame them. They try pretty
hard to let people know that they aren’t a standards body.

WARNING: The following is an internal debate, externalized. I’m not
knocking anything, I’m just trying to make a decision. Well, that’s not
true. I’ll often knock Microsoft, but that’s neither here nor there.

RelaxNG is going to be of limited use if software doesn’t support it. For
example, I use four dominant pieces of software for my XML work; two of the
four are incorporating XMLSchema support, and the remaining two have no
clearly stated plans for any sort of validating support. I haven’t haven’t
seen any XML software claiming to support RelaxNG, except for
RelaxNG-specific software. This doesn’t mean that it doesn’t exist; it
just means that I’m not peripherally aware of it. I don’t use validation
much – I do know it is important for a lot of applications and I value it,
I just don’t have much use for it myself and therefore don’t follow it
much.

My point being, that RelaxNG can be a wonderful, elegant, simple, and
efficient spec – and still be entirely useless. I’m not terribly
interested in bulking up REXML with features that very few people will use,
at the expense of not providing them with features that many need. If I
were just writing REXML for myself, I wouldn’t care; I’d use whatever were
easiest, be it Tom’s XMLProof, RelaxNG, or whatever took my fancy. By this
point, however, I feel a sort of obligation to the people who use REXML.

I think this is one of those cases where I’m just going to have to make a
leap of faith. After looking over RelaxNG – superficially – I prefer it
to XMLSchema. I certainly hope it is more successful than XMLSchema.
XMLSchema looks to me like something Microsoft got their hands on –
bloated and overly complex.

My big question is: do I invest the time and effort in providing XML
validation via RelaxNG, via XMLSchema, or do I wait and see which comes out
"on top"?

···

… “Where’s the garlic? It was just here. It’s like it disappeared in
<|> a puff of smoke. But not even that, because then at least I could
/|\ have said: ‘Poof! There goes the garlic.’” – Monika McDole
/|


(SER) #12

Tobias Reif wrote:

“>” is OK, though some (me) find it easier to read if all such markup is
escaped (rather than just the “<”).

“The right angle bracket (>) may be represented using the string “>”,
and must, for compatibility, be escaped using “>””

You quoted this out of context, Tobi. The spec states that the right angle
bracket must be escaped if it appears in a CDATA section as part of “]]>”,
not if it appears elsewhere in the document. Therefore:

    <a>text>text</a>

is valid. Specifically, the section you quoted in its entirety says:

“The right angle bracket (>) may be represented using the string “>”, and
must, for compatibility, be escaped using “>” or a character reference
when it appears in the string “]]>” in content, when that string is not
marking the end of a CDATA section.”

That part about the CDATA section is rather important :slight_smile:

···

… “If the fundamentalists don’t hate you, you have the wrong
<|> lifestyle.”
/|\ – James Nicoll
/|


(James) #13

My big question is: do I invest the time and effort in providing XML
validation via RelaxNG, via XMLSchema, or do I wait and see
which comes out “on top”?

None of the above. REXML should not do validation. It should, perhaps,
have a means for hooking one or more external validation processors, be it
via DTD, W3C XML Schema, Schematron, XML-Prover, RELAX-NG, regular
expressions, what have you. Maybe.

Maybe it should pass back the results of validation, and perhaps have a
means for updating the REXML document based on post-validation processing
(e.g., default attributes & values provided by the PSVI ).

Validation and document manipulation are two different tasks that should be
handled separately, with an option to chain the tasks, or piped the results
of one into the other.

On a side note, prior discussion of Ruby & XML seemed to emphasize the Ruby
Way, not (necessarily) the W3C way, to process XML. What’s the Ruby Way of
validating XML? Why worry about W3C XML schemas if there’s no W3C XML DOM?

James


(Tobi Reif) #14

james@rubyxml.com wrote:

The complete sentence is

The right angle bracket (>) may be represented using the string “>”, and
must, for compatibility, be escaped using “>” or a character reference
when it appears in the string “]]>” in content, when that string is not
marking the end of a CDATA section.

Yes.

Bad:
“>” is OK. But “]]>” will cause trouble

Good:
“>” is OK. So is “]]>”

Huh? I didn’t doubt any of these.
Well, perhaps I should have added:
“For more details, and complete discussion of XML’s syntax, please see
the XML spec.”

And “for compatibility” is defined as “Marks a sentence describing a
feature of XML included solely to ensure that XML remains compatible with
SGML.”

Yes. But it’s still a feature of XML:

“The right angle bracket (>) may be represented using the string “>”,
and must, for compatibility, be escaped using “>””
^^^^

Tobi

···


http://www.pinkjuice.com/


(Tobi Reif) #15

Sean Russell wrote:

You quoted this out of context, Tobi.

I did not. I provided the URL for further reading.

The spec states that the right angle
bracket must be escaped if it appears in a CDATA section

as part of “]]>”,

It must be escaped everywhere if it’s part of that sequence but not part
of the ending delimiter of a CData section.

And where does the spec say

not if it appears elsewhere in the document.

?

Tobi

···


http://www.pinkjuice.com/


#16

In article 3d03d179@news.mhogaming.com,

Depends on what is meant by “standard”

In this case, I mean what is likely to be used. The W3C does tend to act
like a standards body, but you can’t really blame them. They try pretty
hard to let people know that they aren’t a standards body.

RelaxNG is going to be of limited use if software doesn’t support it. For
example, I use four dominant pieces of software for my XML work; two of the
four are incorporating XMLSchema support, and the remaining two have no
clearly stated plans for any sort of validating support.

My big question is: do I invest the time and effort in providing XML
validation via RelaxNG, via XMLSchema, or do I wait and see which comes out
"on top"?

[1]- I need to use that phrase in my next meeting.

···

Sean Russell ser@germane-software.com wrote:

james@rubyxml.com wrote:


(SER) #17

james@rubyxml.com wrote:

None of the above. REXML should not do validation. It should, perhaps,

Hmm. Ok. What I meant was that I sort of feel obligated to provide a
mechanism by which you can validate XML documents with REXML. The
/efficient/ way to do this is report validation errors while parsing; the
extensible way to do this is to parse the entire document and then validate
the document. Of course, this means that hooking validation into the
streaming parsers will be more difficult.

Validation issues consume a large part of the XML spec. Personally, I’d be
happy to ignore validation altogether – for REXML to solve most people’s
needs, it’ll need some sort of validation mechanism, though.

Maybe it should pass back the results of validation, and perhaps have a
means for updating the REXML document based on post-validation processing
(e.g., default attributes & values provided by the PSVI ).

Ya lost me.

Validation and document manipulation are two different tasks that should
be handled separately, with an option to chain the tasks, or piped the
results of one into the other.

Yes, I agree. It is probably easiest, and more OO, to have the validator be
separate from the core processor.

On a side note, prior discussion of Ruby & XML seemed to emphasize the
Ruby Way, not (necessarily) the W3C way, to process XML. What’s the Ruby
Way of validating XML? Why worry about W3C XML schemas if there’s no W3C
XML DOM?

The Ruby way of validating XML? That’s like asking what’s the Ruby Way to
process documents. You process whatever documents you need to. You
validate whichever way you need to. The question is whether the tool
you’re using supports the way you need to do your job. If you’re a
contractor and all of your clients are sending you schemas in W3C XML
Schema, and you need to do validation, you’d better find an XML parser that
supports W3C XML Schema validation. For me, this boils down to: “Of the
users of REXML, what is the schema language that they’re most likely going
to /need/?” I’m just here to provide solutions, not dictate how you work.
With a number of caveats, of course. :slight_smile:

···

… Tell me, I forget.
<|> Show me, I remember.
/|\ Employ me, I understand.
/| - Ancient Chinese proverb


(Tobi Reif) #18

james@rubyxml.com wrote:

What’s the Ruby Way of
validating XML?

call Xerces via `` :slight_smile:

Tobi

···


http://www.pinkjuice.com/


(Dossy) #19

This just means "when you are escaping the right angle bracket,
you must use > and not the numeric equivalent of &x3E;

Since they said "The right angle bracket (>) /may/ be represented…"
that means that there are places where it does NOT have to be
escaped. However, when it IS escaped, it /must/ be escaped using
the > sequence, and not the &x3E; sequence.

– Dossy

···

On 2002.06.10, Tobias Reif tobiasreif@pinkjuice.com wrote:

Yes. But it’s still a feature of XML:

“The right angle bracket (>) may be represented using the string “>”,
and must, for compatibility, be escaped using “>””
^^^^


Dossy Shiobara mail: dossy@panoptic.com
Panoptic Computer Network web: http://www.panoptic.com/
“He realized the fastest way to change is to laugh at your own
folly – then you can let go and quickly move on.” (p. 70)


(Tobi Reif) #20

XMLers,

I think there are some misunderstandings, some of which are clared now;
this is too offtopic anyways, and the complete thing is in the spec.

So thanks for the dicussion, let’s not bore the non-XMLers :slight_smile:

Tobi

···


http://www.pinkjuice.com/