'Re: [nfs-discuss] roll their own cluster failover w/ ZFS'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-nfs-discuss
Subject:    Re: [nfs-discuss] roll their own cluster failover w/ ZFS
From:       "Frank Batschulat (Home)" <Frank.Batschulat () Sun ! COM>
Date:       2008-05-14 9:14:34
Message-ID: op.ua4zykj5046apg () opteron
[Download RAW message or body]

On Wed, 14 May 2008 02:20:32 +0200, Neil Putnam <Neil.Putnam@Sun.COM> wrote:

> I have been asked about this scenario, with a "roll their own" failover
> scenario.  S10u5  -- I think, at least 127127-11 kernel patch). 0
>
> They have shared storage setup (a shared 3510) with ZFS, and when trying
> to failover they run (aside from migrating the server's IP):
>
>    server1 (failing):        unshare /mypool
>                              zfs export /mypool
>
>    node2 (newly online):     zfs import /mypool
>                              share /mypool
>
> After that the unsuspecting client gets nothing but ESTALE errors when
> trying to read/write previously open files.
>
> Is this something that should work?    I suspect this might be due to
> the fact that zfs_vfsinit() is finding a different major number for the
> storage device on the two different nodes.   Assuming the major number
> is the same on both systems...is this something that should work?

I assume this is using NFSv4 as its the default so suspect that
this procedure is unlikely to succeed without SunCluster and the special
stable storage support build into NFSv4 for the purpose of NFSv4 failover support.

The problem is that the failover node in your case does not have any
state information about the active clients in its own /var/nfs state databases v4_oldstate, v4_state,
only the failing node does have it in his state database so the clients
will be refused access to state reclaim operations by the new server,
ie. reclaiming locks and open files will be denied - should be NFS4ERR_NO_GRACE.

There was quite a bit of work in ON/NFSv4 and SunCluster to make NFSv4 failover
possible in SunCluster, I'd refer to PSARC/2006/313 / bug 6244819.

the gross summary from this:

<snip>
RFC3530 notes that if an NFSv4 server wishes to offer its clients a
grace period, it must record certain data about its clients in stable storage,
available to the server after a service restart (e.g. a reboot).
Our implementation stores this data under /var/nfs/.
We store one file per client. The file is named after the client's IP
address and the server's shorthand clientid for that client. and the file contents are the client's
full form client id. This information is used by the server, after restart, to see if a particular client
should be allowed to perform reclaim operations. A client must only be allowed to perform a
reclaim if, at a minimum, it was an active client of this server under its previous instantiation.
<snip end>

---
frankB


_______________________________________________
nfs-discuss mailing list
nfs-discuss@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic