XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.
It diffs down to the paragraph level at the moment. A future version
will search down to the word.
Prerequisites are REXML, Diff::LCS, and delegate.rb
XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.
It diffs down to the paragraph level at the moment. A future version
will search down to the word.
Prerequisites are REXML, Diff::LCS, and delegate.rb
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 5 ] Gmail invitations
This looks quite cool. Are there any plans to generalize this to XML in general? I can think of lots of good ways to use this if it's more broadly applicable to XML.
···
On Oct 21, 2004, at 5:17 PM, Aredridel wrote:
Since today seems to be the day for document diffing tools, here's mine.
XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.
It diffs down to the paragraph level at the moment. A future version
will search down to the word.
Prerequisites are REXML, Diff::LCS, and delegate.rb
Are there any requests for improvements with Diff::LCS, Aredridel?
None at the moment -- the functional interface is very pleasant, and
the library makes no assumptions about type, so I could ducktype to my
hearts content. Nicely and solidly written, and it's good to see the
McIlroy-Hunt algorithm spelled out in Ruby where I totally grok it,
rather than locked up in Perl or Smalltalk (which I've read, but was
never sure I really got)
Are there any requests for improvements with Diff::LCS, Aredridel?
None at the moment -- the functional interface is very pleasant, and
the library makes no assumptions about type, so I could ducktype to my
hearts content. Nicely and solidly written, and it's good to see the
McIlroy-Hunt algorithm spelled out in Ruby where I totally grok it,
rather than locked up in Perl or Smalltalk (which I've read, but was
never sure I really got)
Not at the moment, since it satisfies my need, and differencing on XML
is a slightly different task, and much easier, or much harder
depending. XHTML has to satisfy the XHTML DTD, and so there's specific
places and specific tags to use to mark changes.
With XML, it would either have to be arbitrarily defined (easy), or
according to each flavor's DTD (hard).
I'm up for it when I get some free time, if someone wanted to specify
what they needed.
Ari
···
On Mon, 25 Oct 2004 01:14:37 +0900, Francis Hwang <sera@fhwang.net> wrote:
This looks quite cool. Are there any plans to generalize this to XML in
general? I can think of lots of good ways to use this if it's more
broadly applicable to XML.
Well, in my case I wanted to compare two different RSS 2.0 feeds. Which doesn't seem to have a DTD, harrumph. I'll be quite happy when we all move to Atom ...
···
On Oct 25, 2004, at 2:07 PM, Aredridel wrote:
On Mon, 25 Oct 2004 01:14:37 +0900, Francis Hwang <sera@fhwang.net> > wrote:
This looks quite cool. Are there any plans to generalize this to XML in
general? I can think of lots of good ways to use this if it's more
broadly applicable to XML.
Not at the moment, since it satisfies my need, and differencing on XML
is a slightly different task, and much easier, or much harder
depending. XHTML has to satisfy the XHTML DTD, and so there's specific
places and specific tags to use to mark changes.
With XML, it would either have to be arbitrarily defined (easy), or
according to each flavor's DTD (hard).
I'm up for it when I get some free time, if someone wanted to specify
what they needed.
Ah, yes -- atom or RSS 1.0 (for comparison, RSS 1.0 would be even nicer)
What sort of interface would you want to compare RSS data? A list of
things that have changed since a previous run? A highlighted list? An
XML sort of patch? Would you want it at the API level, or a textually
annotated set of changes?
···
On Tue, 26 Oct 2004 08:43:09 +0900, Francis Hwang <sera@fhwang.net> wrote:
Well, in my case I wanted to compare two different RSS 2.0 feeds. Which
doesn't seem to have a DTD, harrumph. I'll be quite happy when we all
move to Atom ...
What I was doing was refactoring some RSS code that didn't have enough tests, so I wanted to compare pretty much every element of the resulting RSS. I ended up just eyeballing it in an aggregator, which seems to have worked out okay but still wasn't ideal.
I'd want a pretty granular comparison, and at the API level would be ideal. I don't mind having to do a little work to format the changes into readable output. Also, maybe having API-level information would make it easier for me to filter out certain differences.
F.
···
On Oct 25, 2004, at 11:55 PM, Aredridel wrote:
On Tue, 26 Oct 2004 08:43:09 +0900, Francis Hwang <sera@fhwang.net> > wrote:
Well, in my case I wanted to compare two different RSS 2.0 feeds. Which
doesn't seem to have a DTD, harrumph. I'll be quite happy when we all
move to Atom ...
Ah, yes -- atom or RSS 1.0 (for comparison, RSS 1.0 would be even nicer)
What sort of interface would you want to compare RSS data? A list of
things that have changed since a previous run? A highlighted list? An
XML sort of patch? Would you want it at the API level, or a textually
annotated set of changes?
> What I was doing was refactoring some RSS code that didn't have enough
tests, so I wanted to compare pretty much every element of the
resulting RSS. I ended up just eyeballing it in an aggregator, which
seems to have worked out okay but still wasn't ideal.
I'd want a pretty granular comparison, and at the API level would be
ideal. I don't mind having to do a little work to format the changes
into readable output. Also, maybe having API-level information would
make it easier for me to filter out certain differences.
Hm. Sounds like <ins> and <del> equivalents might be perfect, though
really, you want exact, full-tree diffs. That's a simpler task,
really. Honestly, it sounds like raw Diff::LCS might be the tool you
want -- parse both with REXML, and then hit the trees with Diff::LCS
-- the gotcha being the way REXML deals with containers. Steal the
proxy class from XHTMLDiff and that should be all you need.
Sounds good! I'll give that a try sometime and maybe write a tiny how-to on my blog.
F.
···
On Oct 27, 2004, at 2:17 AM, Aredridel wrote:
Hm. Sounds like <ins> and <del> equivalents might be perfect, though
really, you want exact, full-tree diffs. That's a simpler task,
really. Honestly, it sounds like raw Diff::LCS might be the tool you
want -- parse both with REXML, and then hit the trees with Diff::LCS
-- the gotcha being the way REXML deals with containers. Steal the
proxy class from XHTMLDiff and that should be all you need.