[ANN] XHTMLDiff 1.0.0

Aredridel · 21 October 2004 21:17

Since today seems to be the day for document diffing tools, here's mine.

I'd like to announce XHTMLDiff 1.0.0, available at
http://theinternetco.net/projects/ruby/xhtmldiff for your consumption.

XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.

It diffs down to the paragraph level at the moment. A future version
will search down to the word.

Prerequisites are REXML, Diff::LCS, and delegate.rb

Bug reports are welcome.

Aredridel.

Austin_Ziegler5 · 22 October 2004 15:07

Cool. Always nice to see people using something I wrote

Are there any requests for improvements with Diff::LCS, Aredridel?

-austin

···

On Fri, 22 Oct 2004 06:17:08 +0900, Aredridel <aredridel@gmail.com> wrote:

Since today seems to be the day for document diffing tools, here's mine.

I'd like to announce XHTMLDiff 1.0.0, available at
http://theinternetco.net/projects/ruby/xhtmldiff for your consumption.

XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.

It diffs down to the paragraph level at the moment. A future version
will search down to the word.

Prerequisites are REXML, Diff::LCS, and delegate.rb

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 5 ] Gmail invitations

Francis_Hwang1 · 24 October 2004 16:14

This looks quite cool. Are there any plans to generalize this to XML in general? I can think of lots of good ways to use this if it's more broadly applicable to XML.

···

On Oct 21, 2004, at 5:17 PM, Aredridel wrote:

Since today seems to be the day for document diffing tools, here's mine.

I'd like to announce XHTMLDiff 1.0.0, available at
http://theinternetco.net/projects/ruby/xhtmldiff for your consumption.

XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.

It diffs down to the paragraph level at the moment. A future version
will search down to the word.

Prerequisites are REXML, Diff::LCS, and delegate.rb

Bug reports are welcome.

Aredridel.

Aredridel · 22 October 2004 17:00

e to see people using something I wrote

Are there any requests for improvements with Diff::LCS, Aredridel?

None at the moment -- the functional interface is very pleasant, and
the library makes no assumptions about type, so I could ducktype to my
hearts content. Nicely and solidly written, and it's good to see the
McIlroy-Hunt algorithm spelled out in Ruby where I totally grok it,
rather than locked up in Perl or Smalltalk (which I've read, but was
never sure I really got)

Ari

Aredridel · 22 October 2004 17:00

e to see people using something I wrote

Are there any requests for improvements with Diff::LCS, Aredridel?

None at the moment -- the functional interface is very pleasant, and
the library makes no assumptions about type, so I could ducktype to my
hearts content. Nicely and solidly written, and it's good to see the
McIlroy-Hunt algorithm spelled out in Ruby where I totally grok it,
rather than locked up in Perl or Smalltalk (which I've read, but was
never sure I really got)

Ari

Aredridel · 25 October 2004 18:07

Not at the moment, since it satisfies my need, and differencing on XML
is a slightly different task, and much easier, or much harder
depending. XHTML has to satisfy the XHTML DTD, and so there's specific
places and specific tags to use to mark changes.

With XML, it would either have to be arbitrarily defined (easy), or
according to each flavor's DTD (hard).

I'm up for it when I get some free time, if someone wanted to specify
what they needed.

Ari

···

On Mon, 25 Oct 2004 01:14:37 +0900, Francis Hwang <sera@fhwang.net> wrote:

This looks quite cool. Are there any plans to generalize this to XML in
general? I can think of lots of good ways to use this if it's more
broadly applicable to XML.

Francis_Hwang1 · 25 October 2004 23:43

Well, in my case I wanted to compare two different RSS 2.0 feeds. Which doesn't seem to have a DTD, harrumph. I'll be quite happy when we all move to Atom ...

···

On Oct 25, 2004, at 2:07 PM, Aredridel wrote:

On Mon, 25 Oct 2004 01:14:37 +0900, Francis Hwang <sera@fhwang.net> > wrote:

This looks quite cool. Are there any plans to generalize this to XML in
general? I can think of lots of good ways to use this if it's more
broadly applicable to XML.

Not at the moment, since it satisfies my need, and differencing on XML
is a slightly different task, and much easier, or much harder
depending. XHTML has to satisfy the XHTML DTD, and so there's specific
places and specific tags to use to mark changes.

With XML, it would either have to be arbitrarily defined (easy), or
according to each flavor's DTD (hard).

I'm up for it when I get some free time, if someone wanted to specify
what they needed.

Ari

Aredridel · 26 October 2004 03:55

Ah, yes -- atom or RSS 1.0 (for comparison, RSS 1.0 would be even nicer)

What sort of interface would you want to compare RSS data? A list of
things that have changed since a previous run? A highlighted list? An
XML sort of patch? Would you want it at the API level, or a textually
annotated set of changes?

···

On Tue, 26 Oct 2004 08:43:09 +0900, Francis Hwang <sera@fhwang.net> wrote:

Well, in my case I wanted to compare two different RSS 2.0 feeds. Which
doesn't seem to have a DTD, harrumph. I'll be quite happy when we all
move to Atom ...

Francis_Hwang1 · 26 October 2004 12:03

What I was doing was refactoring some RSS code that didn't have enough tests, so I wanted to compare pretty much every element of the resulting RSS. I ended up just eyeballing it in an aggregator, which seems to have worked out okay but still wasn't ideal.

I'd want a pretty granular comparison, and at the API level would be ideal. I don't mind having to do a little work to format the changes into readable output. Also, maybe having API-level information would make it easier for me to filter out certain differences.

F.

···

On Oct 25, 2004, at 11:55 PM, Aredridel wrote:

On Tue, 26 Oct 2004 08:43:09 +0900, Francis Hwang <sera@fhwang.net> > wrote:

Well, in my case I wanted to compare two different RSS 2.0 feeds. Which
doesn't seem to have a DTD, harrumph. I'll be quite happy when we all
move to Atom ...

Ah, yes -- atom or RSS 1.0 (for comparison, RSS 1.0 would be even nicer)

What sort of interface would you want to compare RSS data? A list of
things that have changed since a previous run? A highlighted list? An
XML sort of patch? Would you want it at the API level, or a textually
annotated set of changes?

Aredridel · 27 October 2004 06:17

> What I was doing was refactoring some RSS code that didn't have enough
tests, so I wanted to compare pretty much every element of the
resulting RSS. I ended up just eyeballing it in an aggregator, which
seems to have worked out okay but still wasn't ideal.

I'd want a pretty granular comparison, and at the API level would be
ideal. I don't mind having to do a little work to format the changes
into readable output. Also, maybe having API-level information would
make it easier for me to filter out certain differences.

Hm. Sounds like <ins> and <del> equivalents might be perfect, though
really, you want exact, full-tree diffs. That's a simpler task,
really. Honestly, it sounds like raw Diff::LCS might be the tool you
want -- parse both with REXML, and then hit the trees with Diff::LCS
-- the gotcha being the way REXML deals with containers. Steal the
proxy class from XHTMLDiff and that should be all you need.

Ari

Francis_Hwang1 · 27 October 2004 11:49

Sounds good! I'll give that a try sometime and maybe write a tiny how-to on my blog.

F.

···

On Oct 27, 2004, at 2:17 AM, Aredridel wrote:

Hm. Sounds like <ins> and <del> equivalents might be perfect, though
really, you want exact, full-tree diffs. That's a simpler task,
really. Honestly, it sounds like raw Diff::LCS might be the tool you
want -- parse both with REXML, and then hit the trees with Diff::LCS
-- the gotcha being the way REXML deals with containers. Steal the
proxy class from XHTMLDiff and that should be all you need.

Topic		Replies	Views
[ANN] lorax 0.1.0 Released ruby-talk	3	106	10 March 2010
[ANN] xhtmldiff 1.2.0 ruby-talk	0	97	5 November 2004
Xmldiff in ruby ruby-talk	1	70	1 December 2005
[ANN] lorax 0.2.0 Released ruby-talk	0	119	14 October 2010
XML file comparison ruby-talk	2	114	26 February 2009

[ANN] XHTMLDiff 1.0.0

Related topics