'Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication m'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication m
From:       Andrew Beekhof <beekhof () gmail ! com>
Date:       2012-10-29 20:04:34
Message-ID: CAEDLWG1+__ZvkEe91AiPMa0A2xqh4vCXr45Co9wXgEWFwR1pew () mail ! gmail ! com
[Download RAW message or body]

On Mon, Oct 29, 2012 at 9:51 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote:
>> On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic <dejan@suse.de> wrote:
>> > On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
>> >> On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
>> >> > Usually,  we use "crm_master" command instead of "crm_attribute" to change master score in RA.
>> >> > But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
>> >> > using instance number on Pacemaker 1.0.x .
>> >> > This probably is not ordinary usage.
>> >> >
>> >> > > Would the existing resource agent work with globally-unique=true ?
>> >> >
>> >> > I don't know it works with true.
>> >> > I use it with false and it dosen't need true.
>> >>
>> >> I suggested that you actually should use globally-unique clones,
>> >> as in that case you still get those instance numbers...
>> >
>> > Does using different clones make sense in pgsql? What is to be
>> > different between them? Or would it be just for the sake of
>> > getting instance numbers? If so, then it somehow looks wrong to
>> > me :)
>> >
>> >> But thinking about it once more, I'm not so sure anymore.
>> >>
>> >> Correct me where I'm wrong.
>> >>
>> >> This is about the master score.
>> >> In case the Master instance fails, we preferably want to promote the
>> >> slave instance that is as close as possible to the Master.
>> >> We only know which *node* was "best" at the last monitoring interval,
>> >> which may be "good enough".
>> >>
>> >> We need to then change the master score for *all possible instances*,
>> >> for all nodes, accordingly.
>> >>
>> >> Which is what that loop did.
>> >> (I think skipping the "current" instance is actually a bug;
>> >>  If pacemaker relabeles things in a "bad way", you may hit it).
>> >>
>> >> Now, with pacemaker 1.1.8, all instances become "equal"
>> >> (for anonymous clones, aka globally-unique=false),
>> >> and we only need to set the score on the resource-id,
>> >> not for all resource-id:instance combinations.
>> >
>> > OK.
>> >
>> >> Which is great. After all, the master score in this case is attached to
>> >> the node (or, the data set accessible from that node), and not to the
>> >> (arbitrary, potentially relabeled "anytime") instance number pacemaker
>> >> assigned to the clone instance running on that node.
>> >>
>> >>
>> >> And that is exactly what your patch does:
>> >>  * detect if a version of pacemaker is in use that attaches the instance
>> >>    number to the resource id
>> >>    * if so, do the loop on all possible instance numbers as before
>> >>    * if not, only set the master score on the resource-id
>> >>
>> >>
>> >> Is my understanding correct?
>> >> Then I think you patch is good.
>> >
>> > Yes, the patch seems good then. Though there is quite a bit of
>> > code repetition. The "set attribute part" should be moved to an
>> > extra function.
>> >
>> >> Still, other resource agents that use master scores (or any other
>> >> attributes that reference instance numbers of anonymous clones)
>> >> need to be reviewed.
>> >>
>> >> Though this "I'll set scores for other instances, not only myself"
>> >> logic is unique to pgsql, so most other resource agents should "just
>> >> work" with whatever is present in the environment, they typically treat
>> >> the $OCF_RESOURCE_INSTANCE as opaque.
>> >
>> > Seems like no other RA uses instance numbers. However, quite a
>> > few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
>> > resources, may potentially lead to unpredictable results on
>> > upgrade to 1.1.8.
>>
>> No. Otherwise all the regression tests would fail.  The PE is smart
>> enough to find promotion score and failcounts in either case.
>
> Cool.
>
>> Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the
>> resource as, not what we call it internally to the PE.
>
> What I meant was that some RA use OCF_RESOURCE_INSTANCE to name
> local files which keep some kind of state. If
> OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that
> the worst that can happen is for the probe to fail.

Right. But only for attach/reattach.
And people should have maintenance-mode enabled at the point the probe
is run, so there is time to fix things up before the cluster does
anything about it.

> But I didn't
> take a closer look.
>
> Thanks,
>
> Dejan
>
>> >> Thanks,
>> >>       Lars
>> >
>> > Cheers,
>> >
>> > Dejan
>> > _______________________________________________________
>> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> > Home Page: http://linux-ha.org/
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread]