Ann: rexml 2.7.4

Hi,

I’ve had a busy few months, which is why I’ve been quiet. REXML 2.7.4
is available for public consumption, though. I’m pretty happy with
where it is at with this release; barring any major complaints, I’ll
retag it as 3.0 and start development on new features.

Yes, there are still bugs. I’m hoping to get most of the outstanding
ones worked out before I do the 3.0 tag, and documentation is also
high on my list. The major new features for the next development
branch are validation, XPath support for the other parsers, and the
ability to hook into native XML parsing libraries if they exist.

Head over the REXML bug page if you have other feature requests or
bugs you want to report. This is a good time to get requests in.

Here’s the executive overview of this release:

  • Fix for the XPath descendant* result set ordering bug
  • SAX2 listener bug fixes
  • Undid a code change that caused a 10x speed regression. Patches to
    fix this were contributed by a number of people; it is really
    embarrassing how long it took me to apply the patch, but it is in
    there now.
  • Indentation fixes, and a new word wrapping feature for text nodes
    was contributed by Devin Bayer. Documentation is forthcoming; for
    now, see below.

“Setting :wordwrapping to :all, wordwraps all text nodes longer than
60 characters.
Setting :indentstyle to aString, make aString used as indentation,
instead of the default ’ '.
And as long as :respect_whitespace isn’t set for the element,
multiline text nodes will be indented.”

I’ve tested this against Ruby 1.9.0, Ruby 1.8.0, and partially against
Ruby 1.6.8 (some of the unit tests require iconv; I’ll improve the
tests a bit later), but I don’t anticipate that any of the current
changes would have affected 1.6.8.

URLS:

The main REXML page (including downloads and documentation):

http://www.germane-software.com/software/rexml

The Subversion repository development branch is 3.0:

http://www.germane-software.com/repositories/public/rexml/branches/3.0

The URL for this tag is:

http://www.germane-software.com/repositories/public/rexml/tags/2.7.4

The darcs repository is at:

http://www.germane-software.com/darcs/rexml

And, finally, these patches have been applied to the Ruby CVS tree,
so 2.7.4+ is in future Ruby releases.

Whew.

Hi,

At Sat, 14 Feb 2004 12:44:57 +0900,
Sean Russell wrote in [ruby-talk:92854]:

Head over the REXML bug page if you have other feature requests or
bugs you want to report. This is a good time to get requests in.

What about [ruby-core:01960]?

···


Nobu Nakada

nobu.nokada@softhome.net wrote in message news:200402140840.i1E8ewFH014793@sharui.nakada.niregi.kanuma.tochigi.jp

Hi,

At Sat, 14 Feb 2004 12:44:57 +0900,
Sean Russell wrote in [ruby-talk:92854]:

Head over the REXML bug page if you have other feature requests or
bugs you want to report. This is a good time to get requests in.

What about [ruby-core:01960]?

Well… the patch in 01960 undoes a bunch of work that was done
specifically to fix some encoding problems. It may have created new
encoding support problems, but at least the new code is going in the
right direction.

  • thread-unsafe due to passing string by a class variable,

Yeah, this could be a problem. But it wasn’t any better before.

  • inefficiency due to load and eval each time, and

I’m not terribly concerned about this, as I don’t expect this to
contribute significantly to the overall overhead of parsing.

  • method names in SHIFT[-_]JIS.rb are not required ones.

This should be fixed in CVS already, a long time ago.

However: all of the Japanese encoding support is untested. I don’t
have any Shift-JIS or EUC encoded XML files to test.

Please, send me a small Shift-JIS encoded and a small EUC encoded XML
file, and I’ll be happy to put a unit test in to make sure the
Japanese encoding works.

— SER

Hi,

At Tue, 17 Feb 2004 06:24:59 +0900,
Sean Russell wrote in [ruby-talk:92998]:

Well… the patch in 01960 undoes a bunch of work that was done
specifically to fix some encoding problems. It may have created new
encoding support problems, but at least the new code is going in the
right direction.

Could you elaborate it?

···


Nobu Nakada

Hi,

At Tue, 17 Feb 2004 06:24:59 +0900,
Sean Russell wrote in [ruby-talk:92998]:

However: all of the Japanese encoding support is untested. I don’t
have any Shift-JIS or EUC encoded XML files to test.

Please, send me a small Shift-JIS encoded and a small EUC encoded XML
file, and I’ll be happy to put a unit test in to make sure the
Japanese encoding works.

Is XHTML OK? If so, glance at http://www.dm4lab.to/~usa/ruby/.

···


Nobu Nakada

nobu.nokada@softhome.net wrote in message news:200402181005.i1IA5OrT013173@sharui.nakada.niregi.kanuma.tochigi.jp

At Tue, 17 Feb 2004 06:24:59 +0900,
Sean Russell wrote in [ruby-talk:92998]:

Well… the patch in 01960 undoes a bunch of work that was done
specifically to fix some encoding problems. It may have created new
encoding support problems, but at least the new code is going in the
right direction.

Could you elaborate it?

Sure.

The new code uses IConv, which is now a standard part of Ruby. UConv
is, AFAIK, obsolete, in addition to requiring a separate download and
install.

The old code was sort of a hack that looked for files named a certain
way in a certain directory, loaded and evaluated them, and required
that each did a certain amount of fairly redundant housekeeping to
register themselves with the encoding engine. The new code requires a
bit of funny syntax, but otherwise is straightforward mixin code, and
the encodings are discovered dynamically when they’re needed, with no
searching through the filesystem.

The old code handled encodings via aliasing a method to the
decoding/encoding methods. This was not thread-safe fixable except
via Method objects or eval(), both of which are way too slow to be
used in the encoding algorithm, which is a frequently called bit of
code.

Each encoding required two files. Yuck.

The new encoding code is pretty ugly, too, and I may give up and drop
the mixin metaphor. However, I need to test the code in a threaded
environment. I tried to use the singleton pattern when I implemented
the new code, and I believe that it is thread-safe. It is not very
efficient to set an encoding, but I doubt that this will have much
overall impact on program execution; much more time will be spent in
parsing, which is what I optimized the encoding support for.

Finally, I need to test whether using an encoding object, rather than
trying to mix-in methods, would be much slower. It would add
overhead, since it would add a method call and method calls are
respectively slow operations. This would simplify the code
significantly, but (again) REXML has enough problems with speed issues
without me aggrevating it with numerous little lags like this.

Hi,

At Fri, 20 Feb 2004 22:49:52 +0900,
Sean Russell wrote in [ruby-talk:93246]:

The new code uses IConv, which is now a standard part of Ruby. UConv
is, AFAIK, obsolete, in addition to requiring a separate download and
install.

I don’t think UConv is obsolete, and as for IConv, which is a
standard part certainly, also needs another library, libiconv
which may not be available on all platform.

The old code handled encodings via aliasing a method to the
decoding/encoding methods. This was not thread-safe fixable except
via Method objects or eval(), both of which are way too slow to be
used in the encoding algorithm, which is a frequently called bit of
code.

I’m not sure which versions you mean by “old code” and “new
code”. Do you mean the code older than imported to ruby CVS
repository?

Each encoding required two files. Yuck.

Sorry, what two files?

The new encoding code is pretty ugly, too, and I may give up and drop
the mixin metaphor. However, I need to test the code in a threaded
environment. I tried to use the singleton pattern when I implemented
the new code, and I believe that it is thread-safe. It is not very
efficient to set an encoding, but I doubt that this will have much
overall impact on program execution; much more time will be spent in
parsing, which is what I optimized the encoding support for.

But current encoding code also doesn’t seem thread-safe. What
happens if the context switches between setting the class
variable and instance_eval?

···


Nobu Nakada

Thanks for the EUC encoded link.

nobu.nokada@softhome.net wrote in message news:200402211129.i1LBTmat015360@sharui.nakada.niregi.kanuma.tochigi.jp

I don’t think UConv is obsolete, and as for IConv, which is a
standard part certainly, also needs another library, libiconv
which may not be available on all platform.

I wasn’t aware of this. I thought iconv was standard now, a
guaranteed part of the Ruby distribution.

I’m not sure which versions you mean by “old code” and “new
code”. Do you mean the code older than imported to ruby CVS
repository?

Code older than what is currently in CVS right now. Code that uses
UConv, which I stripped out at some point.

Matz, could you verify or deny that iconv is available on all
platforms that Ruby >=1.8 is supposed to work on? I remember at one
point you suggested that I use iconv to handle encodings, but if it
isn’t available on some platforms, this is a problem.

Each encoding required two files. Yuck.

Sorry, what two files?

encodings/_decl.rb
encodings/.rb

But current encoding code also doesn’t seem thread-safe. What
happens if the context switches between setting the class
variable and instance_eval?

Yes, this needs to be fixed by synchronization.

— SER

I’m not Matz, but I know that the PP version of 1.8.0 for Win32 doesn’t
support iconv. This may have been fixed in 1.8.1, and even if enabled may
not work if the user doesn’t have the iconv DLL installed where Ruby can
find it.

-austin

···

On Wed, 25 Feb 2004 23:39:49 +0900, Sean Russell wrote:

Matz, could you verify or deny that iconv is available on all platforms
that Ruby >=1.8 is supposed to work on? I remember at one point you
suggested that I use iconv to handle encodings, but if it isn’t available
on some platforms, this is a problem.


austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2004.02.25
* 11.26.08

Hi,

At Wed, 25 Feb 2004 23:39:49 +0900,
Sean Russell wrote in [ruby-talk:93644]:

I don’t think UConv is obsolete, and as for IConv, which is a
standard part certainly, also needs another library, libiconv
which may not be available on all platform.

I wasn’t aware of this. I thought iconv was standard now, a
guaranteed part of the Ruby distribution.

Iconv for ruby is just a wrapper for external iconv library.

But current encoding code also doesn’t seem thread-safe. What
happens if the context switches between setting the class
variable and instance_eval?

Yes, this needs to be fixed by synchronization.

I feel it is due to unnecessarily introduced thread-shared
resource, a class variable. Since aliasing method is atomic in
ruby level, the method in [ruby-core:01960] could get rid of
the race conditions if no name clashes, I expect.

···


Nobu Nakada