[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-ha-dev
Subject: Re: The drbd OCF RA [was Re: [Linux-ha-dev] 3.0 thoughts (dopd)]
From: Lars Marowsky-Bree <lmb () suse ! de>
Date: 2008-08-29 13:51:00
Message-ID: 20080829135100.GG29299 () marowsky-bree ! de
[Download RAW message or body]
On 2008-08-24T19:54:04, Florian Haas <florian.haas@linbit.com> wrote:
> Lars,
>
> Lars Marowsky-Bree wrote:
> > I still think that the dopd is sort-of the wrong approach, btw. This
> > should probably integrate somehow with the drbd RA, using the
> > notifications made available to it.
>
> How does Heartbeat feed DRBD the information that the DRBD
> replication link (which normally is also a Heartbeat link, and that's
> highly recommended, but this isn't strictly a requirement) just died?
It doesn't; but from the fact that we have not (or not done so within N
seconds) delivered a node down event to the RA for the peer, it can
infer that the other side is still up and the replication link has
failed.
> While we're at it, can we discuss the drbd OCF RA a little bit? There's
> a few things that have always sort of bugged me about it, and maybe this
> is the right time to sort them out.
>
> 1. "Floating peers" and the fact that the RA does drbdadm up/down.
>
> I've always considered the fact that the RA does a "drbdadm down" on
> resource stop something of a misfeature, as it means that DRBD
> replication stops when a node goes to standby mode.
No. That is unrelated. The point is that in standby mode, the _node is
not supposed to run any resources_. At all. As drbd is a managed
service, it should not be active even in slave mode. The node is
completely stopped.
What you seem to want is a "do not go higher than target_role == Slave"
setting, but that is not what "standby" currently implies.
> I guess that's
> undesirable for the majority of use cases.
Don't put the node into standby if that's not what you want ;-)
> 2. drbd_update_prefs
>
> It seems odd to me that drbd_update_prefs only tests for connection
> state, which is of lesser importance in terms of whether a resource
> should become a DRBD Primary. Disk state is much more important; a DRBD
> resource whose local disk state is "Inconsistent" should get a strongly
> negative preference (and, just to mention this for completeness' sake,
> it can never be promoted if the connection state is StandAlone or
> WFConnection). An "Outdated" resource can never be promoted.
Yes. This should be updated with more input from your side. I would
appreciate patches.
> 3. Obsolete sanity check
>
> The RA tests for OCF_RESKEY_master_max and requires that it is set to 1.
> That is obsolete as of DRBD 8 which supports dual-Primary mode, so both
> 1 and 2 should be accepted values.
>
> Your thoughts?
Makes sense; this is a drbd0.7 artifact. Probably the RA should detect
the drbd version and decide the acceptable value based on that.
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic