[prev in list] [next in list] [prev in thread] [next in thread] 

List:       eros-arch
Subject:    on the puzzle of journaling
From:       Jonathan S. Shapiro shap () eros ! cis ! upenn ! edu
Date:       1998-01-07 20:15:24
[Download RAW message or body]

After some consultation with Bill Frantz, I now understand how that
race condition in journaling is resolved.  This email is just to close
the loop on it.

THE PROBLEM:

The 'journal' operation is indivisible.  Either the old version or the
new version must be retained in the event of system failure.  To
satisfy this, the writes associated with the journal operation must
occur one at a time (more precisely, the first write must be handled
distinctly from all other duplexes).

If the write fails in mid-write, modern disks can be relied on to
catch this error using a CRC.  The difficult case is if the system
manages to crash after the first write gets tot he disk but before the
second write begins

What had me puzzled about all this was everyone's insistance that only
two writes were involved (assuming two duplexes) in a journaling
operation.  Without making a note somewhere about which copy of the
image is current, I couldn't see how to recover from the case above.
Bill Frantz straightened me out.

THE SOLUTION (per Bill Frantz)

In order to guarantee that journaling occurs correctly, journaled
pages must be marked with a special attribute bit.  What this bit does
is always mark the page 'dirty' when it is read in from the disk.

If the system fails in the way described above, then the journaled
page will eventually be reread.  Either the new version of the page or
the old version of the page will then be pulled off of the disk and
rewritten, with the effect that this version is written to all
duplexes.

The database system uses journaling only for log pages, per the paper
by Bill and Charlie.

SECONDARY PROBLEM:

The database system (or whoever is using journaling) is now subjected
to a secondary problem.  It must arrange that on restart the journaled
page is reread before doing any other operations that would impact the
database log.  The simplest way to do this I can see is to simply run
through the log area rejournaling all pages written since the last
checkpoint before proceding with further transactions.


shap

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic