[prev in list] [next in list] [prev in thread] [next in thread] 

List:       darcs-users
Subject:    Re: [darcs-users] Binary data  (& XML)
From:       Jan Scheffczyk <jan.scheffczyk () gmx ! net>
Date:       2003-06-18 5:58:07
Message-ID: 200306180758.07243.jan.scheffczyk () gmx ! net
[Download RAW message or body]

Hi all,

IMHO there is defnitely a need to handle binary data, especially in 
conjunction with XML.
E.g. Openoffice stores data as zipped XML files.
Some folks at M$ also seem to like the XML idea ;-)

> I haven't gotten around to this largely because I haven't had need of it,
> but also because I haven't decided the best way to deal with binary files.
> For example, I could treat a tar.gz archive as a directory, which would
> provide version control of the files within the archive (assuming they are
> text).

Yes that would definetely help in the office context.
But I'm afraid we need another patch to handle XML files correctly.
Recently I came across the following proposal:

@inproceedings{585073,
 author = {Raymond K. Wong and Nicole Lam},
 title = {Managing and querying multi-version XML data with update logging},
 booktitle = {Proceedings of the 2002 ACM symposium on Document engineering},
 year = {2002},
 isbn = {1-58113-594-7},
 pages = {74--81},
 location = {McLean, Virginia, USA},
 doi = {http://doi.acm.org/10.1145/585058.585073},
 publisher = {ACM Press},
 }

Implementing their XML deltas as patches should be possible in Haskell, making 
use of XML parsers like HaXML, XmlToolbox, or HXML.

In sum, adding support for XML and copressed data would open the road to 
handle office stuff, for which there is a huge market.

> A more normal (and more general) solution would be to introduce binary
> deltas, which would still be a bit of a pain, and much less entertaining.
> It also has the disadvantage that you lose a lot of the benefits of version
> control, since you can't merge binary patches that don't understand their
> content.

Huh, binary patches seem complex to me and I see no real benefit.
Maybe we should simply add a "dummy binary patch" saying

  "replace all content in <oldBinFile> by all content in <newBinFile>"

This would correspond to CVS, which IMHO has no binary diffs and stores 
complete(!) files instead.

> Another interesting type of patch I've considered is an image change patch
> for images files (I'd have to find a good image reading library) that
> change in content but not in size.  In that case you could perform rather
> interesting merges of changes, but while it might be fun to code, I'm not
> sure how useful it would actually be.

Fortunately, some image formats are pure texts, e.g., SVG.

Cheers,
Jan



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic