[prev in list] [next in list] [prev in thread] [next in thread] 

List:       reiserfs-devel
Subject:    (reiserfs) Summary of our current rough thoughts on reversed opportunistic wandering logs
From:       root <reiser () idiom ! com>
Date:       1999-05-31 0:09:28
[Download RAW message or body]


Design assumptions:

* Anything less than 90 seconds for a 9GB drive is good enough.

* To have instantboot, what we need is to know that if Ai changes are made,
they are made all together.

* If we merely want to guarantee recoverability, then we used to think
it would be enough to know that Ai changes do not take effect before
Bi changes reach disk.  Now we know that we must also know which
changes are new ones, and which nodes are discarded old garbage nodes.

* If we want fastboot, we need to know that if Ai changes are made, they 
can be rolled back to Ci by examining only L blocks and not the whole 
disk.  L blocks consist of the blocks containing Ci plus the blocks
indicating which blocks contain Ci.

* If we shift a block from node A to node B, we used to think that B
must reach the disk first before A (flush cells), or that new A must
be written near old A rather than over old A (preserve list).  

Reverse Logging:

Now we realize that it is okay if old A is written to a new location,
and that that location must be recorded along with a pair of blocknrs
mapping the old A in the new location to the new A recorded, before new
A is written into old A's old location.  I will call this "reverse
logging".  Reverse logging has the advantage that it can result in less
perturbation of what may be an optimal layout.

If there is a more optimal location than the original location, then we
write to the optimal location, and mark it as a log block in the bitmap,
and call that forward logging.

Use of forward and reverse logging as appropriate reduces the
wandering of formatted nodes, a significant issue for 1-10k files
which are tightly packed.

Wandering Log Structure:

Every 4k blocks (16mb) have a 4k "meta-block" that contains: 

* a version of the superblock with a generation counter.I will assume
  the superblock is 512 bytes for this email.

* 512 bytes unused

* 512 byte "used bitmap" that indicates every used block in the 4k blocks

* 512 byte "log bitmap" which indicates every reverse log block.  Not more than 256
	blocks (1mb) are allowed in the reverse log
	before a commit.  We do reverse logging, which is to say that
	we log the state of the block as it was before the change.
	Question: is it true that nothing that we care to log gets
	overwritten without us normally reading it first anyway
	even without doing logging?  I think the answer is yes, but do I 
	neglect to consider something?

* 2k byte "log mapping" list of blocknumber pairs which indicates which log blocks
must be written over which other blocks to eliminate the change set.
log blocks not on this list are assumed to be forward logs to be
marked free and are thrown away.

The bitmaps in every meta-block are valid, both the log bitmap and the 
used bitmap.  Only one superblock is valid, and the metablock
containing it also has the only log mapping (2k list of blocknumber
pairs) that is valid.

When we change any block set of blocks A1-An, we reverse or forward log
them, then we sync all dirty meta-blocks.  The meta-block for the first
block in the log-map is the one that will contain the log-map.  Until
the corresponding reverse log blocks and the meta-blocks are flushed, no
block A1-An is allowed to reach disk.  Then we allow A1-An to reach
disk.  Then we zero the log-bitmaps.  Note how the need to atomically
sync the meta-blocks prevents using more than one logmapping. 

When we recover we read all meta-blocks.  For a 9GB drive that means
576 blocks, or ~5 seconds.  We throw away all super blocks but the one
with the highest generation number.  

Whenever we write a super block with generation counter with value 0,
we write it to all meta-blocks.  We use a 64 bit int for the
generation counter, so this has negligible performance impact.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic