[prev in list] [next in list] [prev in thread] [next in thread] 

List:       intermezzo-devel
Subject:    more work
From:       "Peter J. Braam" <braam () clusterfilesystem ! com>
Date:       2001-11-17 19:36:17
[Download RAW message or body]

Gord, 

[Phil: could you review this too?]

Scanning to the next record in the KML fixed the crash!  We were still
not setting the file sizes correctly.  We don't write data into the
file, but we do want the correct length (in this way the kernel knows
when to demand fetch data). I fixed it and checked it in.

Since I wrote last weekend the plans have not changed.  So let me be
more detailed about the next steps.  I'm going to be traveling - so
I'm coming up with something for a month or so. 

1. Reintegration - approx similar to fsreint
--------------------------------------------

We just did kml_fsreint, now we need kml_reint.  

Here the user level code hands a single KML record to the kernel using
a IZO_IOC_REINTKML ioctl. 

The kernel now does the full reintegration of the record.  The ioctl
should return an conflict error (one of 4 see below) if a conflict is
found.  

You can see in the Lento (ReintLento.pm) code what reintegration
entails: typically it consists one or two transactions made from user
space: the first does the operation, the second may fix up
attributes. In effect, you will use the kml_unpack code in the kernel
and repeat the the steps that the ioctl's from user space already
did. 

There is sample code already that does this in kml_reint.c (I'm not
sure it is correct).  I will provide the framework for your ioctl by
Sunday night. 

--> We need to make sure that the kernel reintegration does exactly the
right thing.  All attributes except directory sizes (which we do not
control) should be the same. 

For security we want the following: each fileset has a mount path,
which is associated with a (dentry,vfsmnt) pair (the latter is a
refinement of the namespace allowing file systems to be mounted
multiple times - the mnt parameter tells you what mount point you are
dealing with).  The setfsetroot code should set a vfsmount *field and
dentry field in the fset structure (not per record as it is currently
done in kml_reint.c) -- I'll put that in place too.

--> The reintegrator chroots _and_ chdirs to the fset mount point and
as found in the (dentry/vfsmount) of the fileset structure (it pops
this after the reintegration).  The reintegrator each record sets and
pops the current fsuid/fsgid and groups array for the process.  Checks
for '..' do not have to be made anymore as this process cannot leave
the

Here I want to be much more precise about conflicts: 

No conflicts can happen unless the sequence numbers are off.  So no
checking is needed (if it costs overhead, we might want to exploit
that).  Your code should print a warning if the record numbers are not
what the fileset is expecting (the fileset has last rcvd recno
fields).  In fact, we may change the record format at some point and
tell a record what the previous record was -- optimizations can then
take records out and just update the previous record field in the next
record (not now though, but possibly for 2.0 we will make the kml
format change). 

--> an optional warning should be printed if record numbers are off. 

Conflicts are then checked and considered found when:

 - the mtimes/ctimes/sizes (struct presto_version) (size comparison
 not for directories) of the affected objects do not match thos in the
 records.

--> an optional warning prints that these versions are not right.

If a conflict is found, there are certain cases where we can continue
to reintegrate - taking note of the 4 conflicts that cannot be
resolved automatically:

 - name/name conflicts: the creation of an object hits an existing
object with the same name

 - update/update conflicts: updates are setattrs/closes/(and extended
attributes).  Overwriting the existing data/attr's here is not
necessarily a good idea - so we want the warning.

 - update/delete conflicts: One of these update first, delete after
can be detected.  The other one is means that an object that is being
updated no longer exists.

 - rename/rename conflicts: objects have been renamed to different
files.  This would show up as a missing source object.

--> The code should report the conflicts precisely and NOT
reintegrate, but back out.   The conflict resolution tool will help
here (we will build one over the next week or two). 

--> A close record should call a callback function.  We will deal with
closes soon. I want you to implement two dummy callback functions: one
is to do nothing except to advance the in memory record counter so
that the next record does not show a record number mismatch.  The
second is to write a close record in the KML (but not do anything
else). 

--> Currently the kml_reint code calls set_fs stuff all the time.  In
fact we need to make all the getname/putname stuff in vfs.c optional
(put a flag in the context) then the setfs/putfs code can go. Soon we
will not need the case anymore where this is called from userspace at
all. 

--> We should have flag on the reint ioctl NOT to write KML during the
reint.  Clients don't need KML when they reintegrate server records
(they do update the last received record numbers each time). 

You do NOT have to clean up lento to use this -- Phil will do that.
What we need is again the simple program that replays the KML as we
did for fsreint this week. 

2. Anti reintegration: slightly simpler
---------------------------------------

Suppose a KML segment has been reintegrated.  We want code to 'undo'
it (we may find that the KML records are not always precise enough to
undo KML in which case we will change the records a bit). 

The purpose here is to support high availability clusters and
branching file systems. Each branch has a parent and leaf KML and we
want to roll back to the parent. The parent has separate file data for
old versions.  This will still need to be implemented but is basically
simple.  In the high availability case, system B may steal a permit
from system A.  B notes down "stole permit at "recX".  When A comes
back up it may have KML that was not reintegrated to B - that would
correspond to "recY".  To bring the systems in sync, we 

 - undo recX-recY on A. 
 - reintegrate recX---tail from B to A

We will initally only undo KML that we have just done in exactly the
reverse order, record by record.  We do this again with an ioctl as
above.

If something was removed, it needs to be reinstantiated (we may lack
attributes of the removed object - if so tell me, and we'll make the
changes to KML rightaway) For example on setattr's we need the old
attributes - we only have size/mtime/ctime at the moment, we are
missing mode and owners -- this will bump KML version to 1.2; we do
not do backward compatibility yet.  If something was created it needs
to be removed.  Renames need to be undone too.

If file data needs to be restored, call a callback function (the old
file data _will_ be available). 

Everywhere where something is removed, we want a callback to possibly
first save the object in an /elsewhere/fileset/ area.  You do not have
to code this part yet (see below). 


3. Conflict resolution
----------------------

We will do this in multiple phases, but this one is the core. 

All conflicts should be handled on clients.  Servers and proxy servers
which detect a real conflict (type 1-4) should NOT reintegrate - but
tell clients to resync their filesets - we will do this next, but Phil
Schwan needs to educate us a little more on KML replay.  

If "approved" clients are used conflicts should never happen on servers.

When a client detects a conflict, the idea is to move the conflicting
stuff out of the way so that the fileset itself remains clean and
identical to the one on the server, except for changes that reall will
reintegrate.

So we are looking at the case where the client has a new set of KML
records from the server (server KML), but has made changes itself too
(so there is a piece of client KML for the fileset). Conflicts could
now happen.

Goal: 
(1) reintegrate server KML (only real kernel reint's here)

(2) if move conflicting client stuff out of the way /elswhere/fileset is
the destination (HANDLE CONFLICT)

(3) change affected client KML records to noop's (DISABLE KML)

[in fact (3) needs to be done before (2) to make things replayable]

(4) VERIFY that the remaining client KML records do not conflict with
the server KML that was sent (in that case we "know") that after the
client conflict handling the client KML will reintegrate on the server.

(5) if the process is interrupted (system crash, network down) it
should be possible to resume it.  In particular we should be able to
do this 
 (a) multiple times for small amounts of server KML at a time
 (b) in one blow with the entire server KML. 

For each record type you need to analyse what can happen.  For
example: 

[PLANNING]

- setattr:
 (1) the object or parent is no longer there.
     - remedy: instantiate it (but do not write to the KML).  Find the
       affect KML "rm" records and undo them. Write a note in the
       WARNING file. 
 (2) the old attrs don't match those in the KML record.
     - remedy: write in the /elsewhere/fileset/WARNINGS what the old attr's
       were and change them anyway.  Change setattr records in local
       KML for this object to noops.
 (3) permission errors
     - remedy: need a plan here (we will generate one for you), this
     could happen in any ancestor, possibly we need to reset all
     ancestors.  Further KML could rename the ancestors so to fetch
     attributes is not that simple, but the local KML has all the
     information we need!  We scan the local KML for all records
     related to a pathname and find back the old permissions of the
     ancestors.   This involves disabling a bunch of setattr records.
     We might also find that a file tree was renamed. The hash I
     mention below should probably take that into account. 


- create/mkdir/link/symlink/mknod:
 (1) something else was created there already
      - remedy: move it to /elswhere/fileset/"something else
        subtree".  Find offending records in local KML, 
	write help in WARNINGS (the warnings could e.g. encode if we had a
        rename conflict or created a bunch of new things).
 (2) parent doesn't exist:
      - remedy: create the parent, find the offending "rm" records in local
        KML and change them to noops.
 (3) permission problems

- rename (careful here, this one is complicated!!)

- unlink/rmdir

[PLANNING]
--> Complete planning: the stuff I'm writing here is quite close to an
implementation plan, but thinking this through in further detail is
probably a good idea. 

[HANDLE CONFLICT]
--> complete this handling list.  Note down if anywhere on the client
you are using ROOT permission to do something (eg. disable a KML
record, change a file's owner etc -- we want to understand what part
of this can run as a user, what part really requires mucking with the
cache on the client.). 

When reintegration starts, the server will report to the client
what KML it hasn't seen yet (it reports its last rcvd recno/offset).
During this reint, the KML will NOT grow (we do this for server KML
reaching a client). The tail will be a record sequence "last_rcvd
offset to current tail".  If conflicts happen some those records may
not be reintegratable, and in the process we might need to mark some
as noops.  

[DISABLE KML]
--> Write a bit of user level glib code that builds a hash table that
maps a pathname (possibly parent pathname too) to a linked list of KML
record offsets in the local KML in this tail segment. (And something
that cleans this up.)  You may assume there is enough memory for
this. 

This hash will allow you to retrieve old attributes of ancestors for
example. 

I think that we will _always_ handle conflicts in user space and that
we will always modify the intermezzo caches through ioctls.  Moving
something out of the way may involve a hacked up copy of  the "mv"
command, that uses ioctl's on the InterMezzo side (to do the unlink)
and ordinary system calls in the /elsewhere directory. 

The task is to write the handlers for each conflict case. 

[HANDLE CONFLICT - step 2]
--> write handlers: we have seen 4 so far:
 (1) move subtree out of the way
 (2) write warning about attributes
 (3) generate attributes for ancestors (we will devise the algorithm)
 (4) instantiate parent (attrs of ancestors are interesting here!)

[DISABLE KML - step 2]
--> write code that marks certain records in the client's KML as
noop's. They are the records that were moved out of the way.  The
resulting KML segment should be something that the server can
reintegrate. You need the hash table for this. 

Finally, when this process is done, we need to implement (4):

[VERIFY]
--> check that remaining client KML does not conflict with the server
KML segment.  

Wow, this would be neat to have!!


_______________________________________________
intermezzo-devel mailing list
intermezzo-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/intermezzo-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic