[prev in list] [next in list] [prev in thread] [next in thread]
List: git
Subject: Re: [RFC] Plumbing-only support for storing object metadata
From: Jan Hudec <bulb () ucw ! cz>
Date: 2008-08-10 11:09:25
Message-ID: 20080810110925.GB3955 () efreet ! light ! src
[Download RAW message or body]
Hello All,
I am glad you came up with this, as I think this is the only reasonable way
to support things like etckeeper. The metastore and similar solutions are
a kludge and fall apart in so many cases.
I am not sure your approach is the right one, though. I tend to agree with
Shawn it's not. So here is a couple of alternate proposals (sorry, it's a bit
long, as I have several variants with different drawbacks I would like to
discuss).
On Sat, Aug 09, 2008 at 14:07:33 -0700, Jamey Sharp wrote:
> The attached test illustrates a proposal for minimal plumbing support
> usable to store permissions, ownership, and other metadata in git
> repositories. This proposal is fully compatible with existing
> repositories when the new functionality is not in use. Similar to the
> introduction of subprojects, we have not yet specified the porcelain. We
> believe that the plumbing will provide sufficient functionality for many
> uses, and these uses will help determine the appropriate porcelain.
I think the main way to use it would be a hook, that would read/write the
attributes to/from the tree. That will do the right thing for storing
permissions, owners and other things represented in the worktree. And
metadata that are neither part of the tree or directly related to git's
functionality are out of our scope.
> [...]
> We propose representing objects with metadata using a new "inode"
> object. An inode object contains the hash of the real object and the
> hash of a "props" (properties) object. A props object contains a set of
> name-value pairs. Tree objects can reference inode objects in addition
> to the current possibilities of blobs, trees, and subproject commits; we
> propose using the currently invalid type 110000 (S_IFREG | S_IFIFO) for
> inode objects. We primarily see a use case for inodes referencing blobs
> and trees, though as defined they support any object type.
I think this is the overly complex -- and also the needlessly incompatible
part. By the way, I don't think you need separate type for props -- it can be
a blob too.
I would suggest investigating following options:
1. It would be possible to use clean/smudge filters to encode the attributes
in the blob itself.
2. Store the metadata in separate objects, but link them in the parent tree
directly. In this case, each attribute could probably get it's own blob,
so eg. for a file foo the tree containing it would have entries:
foo
foo<sep>owner
foo<sep>permissions
...
Where <sep> would be some sepatator (more on that below).
Advantages (+), disadvantages (-) and possible (*) extensions of 1:
+ It should be possible to get to something useful with very little changes
to git. Basically all it needs to be useful for things like etckeeper is
to:
. Make sure both clean and smudge filter always get filehandle to the
disk file in question (I am /not/ suggesting path as the file may be
written in a staging area and moved into place later).
. Pass the blob id currently in index to the clean filter, so it can
maintain the data if they are not representable in this particular
checkout (eg. when checking out such repo on windows). Note, that this
would also be useful for ignoring insignificant changes, eg. when
a in some config file order is not important and the tool using it
randomly changes that order when changing that file.
- It does not support metadata for directories, but could be crossed with
approach 2 to fix that. Git could special-case entry '.' for storing
"content" of a directory, which would be wholly created by running the
clean filter on a directory (I am not sure directory handles are portable,
but running with that directory as current should be). This would not have
the problem of approach 2 with the entry names for the metadata.
* Default processing could be added to strip the metadata in smudge and
re-add them from index on clean. This would require adding some marker to
know which blobs need this treatment. I see two ways:
. Using different file type for them. There are already two types
pointing to blob (S_IFREG and S_IFLNK) and they are treated differently
on read (clean) / write (smudge) from/to tree, so third type should be
workable.
. Using additional format. Currently a blob is encoded as
"blob" <LF> <content>
so maybe an extneded blob could be encoded as
"blob extended" <LF> <content>
without needing a special type for it. But I don't know git internals
enough to know how easy, hard or dirty this would be.
Advantages (+), disadvantages (-) and possible (*) extensions of 2:
+ It would work the same way for directories and file, or mostly so.
+ Different metadata would be handled independently, so it would be easier
to combine support for multiple attributes (not that I can imagine any
sensible use beyond access lists (owner, permission, posix acl)).
+ Checking out without the hooks could easily create special metadata files,
providing easy way to work with the attributes where they are not
supported by the underlaying filesystem.
- It would require reserving some names for the metadata entries. I see
basically three ways to name the attribues:
. Reserving some character for the separator, eg. @ or # or something
like that. So with file foo, there would be entries:
foo
foo@owner
foo@permissions
This has following pros and cons:
+ Minimal changes to the index <-> tree logic (remember, index is
flat and has no directory entries, so the tree writer must decide
to which tree each entry goes).
+ Trivially supports checking the metadata entries out as special
files on filesystem without metadata support.
- The character is reserved in trees that need the feature (the trees
that don't need it don't need to care).
Note, that the metadata entries could have mode either S_ISREG, or
a new one. Inclined to say S_ISREG -- we have the special name to
distinguish them.
. Using something that does not exist in a normalized path, ie either
"//" or "/./". So with file foo, there would be entries:
foo
foo//owner
foo//permissions
This has following pros and cons:
+ Does not reserve any characters. Every filename is permitted even
when the freature is used.
- Harder on the index <-> tree logic, as it would have to not consider
such strings as not being directory separators.
- Such files could not be checked out, though they could still be
manipulated using cat-file and update-index.
The metadata entries could have mode either S_ISREG or a new one again.
New mode would be sensible if it would make easier on the index <->
tree logic (it's easier to check 3 bits than search string for
a substring).
. Leave the suffix for metadata entries to the hooks. This would be
middle road between the above two:
+ Reserves as little as possible, while not complicating the index <->
tree logic.
+ Remains easy to check out as special files where you can't run the
hooks, though this would require some special-casing similar to
symlinks on Windows.
- Would require new mode for these entries, so we know they are
created and consumed by the hooks rather than directly read/written
to the tree.
Best regards,
Jan
--
Jan 'Bulb' Hudec <bulb@ucw.cz>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic