[prev in list] [next in list] [prev in thread] [next in thread] 

List:       git
Subject:    Re: Maintaining historical data in a git repo
From:       Seth Robertson <in-gitvger () baka ! org>
Date:       2012-03-30 15:10:52
Message-ID: 201203301510.q2UFAqn6003864 () no ! baka ! org
[Download RAW message or body]


In message <CA+P+rLeyEcZPudhLWavB74CiDAqpn+iNkk4F8=NK_yGaJPMmyA@mail.gmail.com>, Yuval Adam writes:

    As part of a public project to open-source the Israeli law code, we
    are looking into ways of represent such data in a git repository.

This is extremely cool.  I wish others were forward thinking enough
to do this.

    The main challenge is to represent historical data _in a semantically
    correct way_ within a git repository, while having the ability to
    change data that has occurred in the past.

Revision control shouldn't be used to change the past (even if git
allows this with sufficient amounts of pain/warning to all users).
What it is extremely good at is preserving the past and tracking the
changes that are made.

    For example, we might have revisions B and C of a certain legal
    document, commit to repo, and at a later time want to add revision A
    to the proper place in the git commit tree (probably with rebasing or
    replacing).

There is no problem doing this.  I'll make up a mythical workflow
which might be realistic.  Someone proposes a bill, so a branch for
the proposal is created.  In many of the laws I am familiar with,
there is the text of the law and then the text says "Amend V.5.12.A.b
to add '25: or to commit a nasal offense (as defined in V.5.12.A) with
a shoe'".  So the branch might contain the text of the proposed law
and then actually go through to the document V.5.12.A.b and add the
new data to the appropriate file (in an ideal world that might be an
automatic process, but laws are rarely so precise).  The proposed law
changes and the bill text changes would be committed onto the branch.

As the bill goes through committee people make changes, adding things,
removing things, etc.  Each change is a commit.  One example change
might be a new change saying "remove the change made 2 days ago" or
"make the current version the version from 10 days ago".  Both of
those specific changes would ideally be positive changes.  You would
not actually be deleting the change made two days ago or removing all
changes made between 10 days ago and now, you would be making a new
commit to remove the effects of the unwanted changes.

When the negotiations are over and assuming the bill gets all three
readings (each reading could be a "tag" to document exactly what was
read) and voted into a law, you would then merge the bill branch into
the "law" branch which represents the actual legally active laws.
This could be done as a "squash" merge which hides all of the
committee negotiations or it could be done as a normal merge which
allows the history of the negotiations to be visible, or, depending on
the visibility of the committee negotiations, you could even do a
combination of the two.

And yes, git supports more complex processes automatically, like each
Knesset member making their own proposed changes and the committee
chair merging the appropriate version in if it was approved and the
others being either discarded or archived for history but not
incorporated.

    Allowing decentralization and updates is a major requirement.

git is extremely good at this.

    We're trying to map out the various pros and cons of the different
    options of maintaining such a repo.

Ideally the data being represented would be structured, textual, and
somewhat line oriented, plain text/UTF-8 files (no matter the word
direction) like this email are ideal.  Committing binary Office
documents (Word, OpenXML, ODF, etc) is not ideal, since under most
circumstances/without a lot of work you are not going to get good
differences so that you can easily see the history of the law.  You
can write custom binary drivers to extract this difference information
from these binary documents, but that is the "lot of work" I was
talking about.

You additionally might want to have separate repositories for separate
groups of laws to prevent repositories from getting unwieldy.  There
are tools which let you group repositories together.

    Has anyone ever attempted something like this?

Many people use git to track living documents.  Perhaps not law per
se, but I don't particularly see why that would matter.

    Are there any projects that build on the git plumbing which provide
    wrapper APIs to handle historic data?

Are you talking about "get rid of that change, it was bad" and
"restore this version of the document as the good one" or "how do I
import 64 years of law into git"?  Git provides native tools to handle
both.

    We really could use any reference or advice we can get on this subject.

I'll point you at http://progit.org/book/ as a general reference about
git and http://sethrobertson.github.com/GitBestPractices/ as a
reference about best practices.

					-Seth Robertson
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic