[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: The drbd OCF RA [was Re: [Linux-ha-dev] 3.0 thoughts (dopd)]
From:       Lars Marowsky-Bree <lmb () suse ! de>
Date:       2008-08-29 13:51:00
Message-ID: 20080829135100.GG29299 () marowsky-bree ! de
[Download RAW message or body]

On 2008-08-24T19:54:04, Florian Haas <florian.haas@linbit.com> wrote:

> Lars,
> 
> Lars Marowsky-Bree wrote:
> > I still think that the dopd is sort-of the wrong approach, btw. This
> > should probably integrate somehow with the drbd RA, using the
> > notifications made available to it.
> 
> How does Heartbeat feed DRBD the information that the DRBD
> replication link (which normally is also a Heartbeat link, and that's
> highly recommended, but this isn't strictly a requirement) just died?

It doesn't; but from the fact that we have not (or not done so within N
seconds) delivered a node down event to the RA for the peer, it can
infer that the other side is still up and the replication link has
failed.

> While we're at it, can we discuss the drbd OCF RA a little bit? There's
> a few things that have always sort of bugged me about it, and maybe this
> is the right time to sort them out.
> 
> 1. "Floating peers" and the fact that the RA does drbdadm up/down.
> 
> I've always considered the fact that the RA does a "drbdadm down" on
> resource stop something of a misfeature, as it means that DRBD
> replication stops when a node goes to standby mode. 

No. That is unrelated. The point is that in standby mode, the _node is
not supposed to run any resources_. At all. As drbd is a managed
service, it should not be active even in slave mode. The node is
completely stopped.
What you seem to want is a "do not go higher than target_role == Slave"
setting, but that is not what "standby" currently implies.

> I guess that's
> undesirable for the majority of use cases.  
Don't put the node into standby if that's not what you want ;-)

> 2. drbd_update_prefs
> 
> It seems odd to me that drbd_update_prefs only tests for connection
> state, which is of lesser importance in terms of whether a resource
> should become a DRBD Primary. Disk state is much more important; a DRBD
> resource whose local disk state is "Inconsistent" should get a strongly
> negative preference (and, just to mention this for completeness' sake,
> it can never be promoted if the connection state is StandAlone or
> WFConnection). An "Outdated" resource can never be promoted.

Yes. This should be updated with more input from your side. I would
appreciate patches.

> 3. Obsolete sanity check
> 
> The RA tests for OCF_RESKEY_master_max and requires that it is set to 1.
> That is obsolete as of DRBD 8 which supports dual-Primary mode, so both
> 1 and 2 should be accepted values.
> 
> Your thoughts?

Makes sense; this is a drbd0.7 artifact. Probably the RA should detect
the drbd version and decide the acceptable value based on that.


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic