'[Linux-ha-dev] journalled data CFS understanding'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    [Linux-ha-dev] journalled data CFS understanding
From:       David Brower <dbrower () us ! oracle ! com>
Date:       2000-01-28 2:46:04
[Download RAW message or body]

I've been thinking about the issues with clustered journalled-data
file systems.

In GFS, the intent is to handle all pages in the file as versions of
the data represented by a generation number in the dinode.  This
version will need to be a 64 bit value, as each write to a file could
potentially update the version.

In the event of a multiple failure, it is believed that replay of the
failed nodes can be done sequentially, independantly without merging
the logs.  The belief is each log record can be compared to the
on-disk dinode version, and the data blocks in the log applied only if
needed.  I think this works if, and only if -all- dirty blocks in a
file are flushed to disk before the block is transferred to another
node.  A log flush alone will not suffice.  This will have awkward
performance, as it effectively removes the efficacy of locking
individual blocks, and sending them individually.

Am I missing something?

Perhaps it is necessary to have a global dinode generation, and
then per-journal generations.  This way, recovery can know that 
the dinode is up to date to global generation X, but that the
data for this journal is only good through M.

-dB

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.tummy.com
http://lists.tummy.com/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread]