[prev in list] [next in list] [prev in thread] [next in thread] 

List:       redhat-linux-cluster
Subject:    Re: [Linux-cluster] qdiskd + cman: trying to fix the use of
From:       Lon Hohberger <lhh () redhat ! com>
Date:       2007-01-08 15:43:26
Message-ID: 1168271006.15369.22.camel () rei ! boston ! devel ! redhat ! com
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


On Sun, 2007-01-07 at 20:29 +0100, Simone Gotti wrote:
> Problem 2)
> 
> After fixing Problem 1, if I set in the quorumd tag of cluster.conf an
> interval > quorumdev_poll/1000*2 the quorum is lost then regained over
> and over as the polling frequency of qdiskd is less than the polling one
> of cman.
> Probably the right thing to do is to calculate the value of
> quorumdev_poll from the ccs return value of "/cluster/quorumd/@interval"
> and quorumdev_poll=interval*1000*2 should be ok.

I think the poll rate should be closer to (interval * tko * 1000) [10
seconds by default] - and not a function of just the quorum disk
interval.  

This is because after (interval*tko*1000), the master node of the
cluster will write an eviction message to a hung node - and that's when
qdiskd will either reboot the node or tell CMAN that its votes are no
longer valid.

I do not think it will cause any problems per se, but dropping qdiskd's
votes after ~2 seconds when the qdisk master won't write an eviction
notice for another ~8 seconds seems a bit odd.

Normal node failure delay should be >= 2*(i*t*1000).  There's a
parameter in the <totem> tag (which defaults to 5,000ms) - which should
be 2 * interval * tko * 1000, but I don't recall what it is right now.

qdiskd needs to time out before CMAN does.  While it doesn't have to be
"half or less", it's a good paranoia factor that's easy to remember, and
it gives the node plenty of time.

-- Lon

["signature.asc" (application/pgp-signature)]

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic