'Re: Quick question'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: Quick question
From:       Jos van den Oever <jos.van.den.oever () kogmbh ! com>
Date:       2010-08-24 20:38:50
Message-ID: 201008242238.51010.jos.van.den.oever () kogmbh ! com
[Download RAW message or body]

On Tuesday, August 24, 2010 22:06:59 pm Dr. Robert Marmorstein wrote:
> > There is a difference between showing differences between documents and
> > creating and applying patches. For the former, more work must be done
> > than for the latter, since the latter assumes a very similar working
> > copy.
> > 
> > A tool to do these comparisons would indeed be very useful. Not just for
> > you synchronization needs, but also to check roundtrip accuracy of
> > koffice.
> 
> Exactly.  It is not at ALL an easy problem.  And yet, it would be very,
> very nice to have.  I think implementing "patches" (for change-tracking)
> would be a good first step toward implementing such a tool.  My naive idea
> was that you could parse both XML files using a DOM approach (storing all
> nodes in memory), then iterate through one on a node-by-node basis,
> searching for the equivalent node in the second file.  That might get very
> expensive, though.  I like your idea of assigning an (arbitrary) ordering
> to the nodes better.
> 
> Another problem would be displaying the results -- how do you display
> things like format differences?  Variables?  You don't want a cluttered
> interface, but you would still want to highlight the major differences.
> 
> I don't think there's any good software for comparing Micro$oft word
> documents in this manner, so having this could be a killer app for open
> source.  I'll continue to think it through some.

I think a good initial step would be to formate an ODF canonicalization: how 
to order things without needed order. Automatic styles would be ordered by 
first reference and named styles by name. I'm sure there are more things that 
need ordering and that they can have some rule applied. Then the files in the 
zip should have an agreed upon order. A question there is: should image 
filenames be retained or not? I would say no, since this information is lost 
when going from the zip format to the xml only format anyway.

Once you have canonicalized the two input files, which should not affect their 
contents at all, you can run a 'diff -r' or kdiff3 on the unzipped files.

Cheers,
Jos

-- 
Jos van den Oever, software architect
+49 391 25 19 15 53
http://kogmbh.com/legal/
_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel
[prev in list] [next in list] [prev in thread] [next in thread]