[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] Re: Difference between OCF_ERR_CONFIGURED and
From:       "Andrew Beekhof" <beekhof () gmail ! com>
Date:       2008-07-08 13:19:04
Message-ID: 26ef5e70807080619v62fce67hcd9b22df100fc95 () mail ! gmail ! com
[Download RAW message or body]

On Tue, Jul 8, 2008 at 15:09, Andrew Beekhof <beekhof@gmail.com> wrote:
> On Tue, Jul 8, 2008 at 15:03, Andrew Beekhof <beekhof@gmail.com> wrote:
> > On Fri, Jul 4, 2008 at 16:52, Joe Bill <pica1dilly@yahoo.com> wrote:
> > > 
> > > > --- On Fri, 7/4/08, Andrew Beekhof <beekhof@gmail.com> wrote:
> > > > > Exatcly how does heartbeat handle OCF_ERR_CONFIGURED and
> > > > > OCF_ERR_INSTALLED differently ?
> > > > 
> > > > From some badly formatted and not-quite finished documentation:
> > > > 
> > > > soft  = stop and retry
> > > > hard  = stop and retry - current node is excluded
> > > > fatal = stop - all nodes are excluded
> > > 
> > > Taking the opportunity then that the documentation is not yet finished, I would \
> > > like to make the following suggestions: 
> > > - "soft" be changed to "error, unexpected"
> > > 
> > > - "hard" be changed to "fatal, local" or "critical, local", or "fatal, node" or \
> > > "critical, node" because we have diagnosed that the resource at fault is local \
> > > to the node where it has been detected on 
> > > - "fatal" be changed to "fatal, common" or "critical, common" or "fatal, \
> > > cluster" or "critical, cluster" because we have diagnosed that the resource at \
> > > fault is common to all nodes in the cluster. 
> > > > 5 The requested agent or tool required by the agent is
> > > > not installed. hard
> > > 
> > > I believe "resource configuration" to be more appropriate here. HA shouldn't \
> > > care at this point if it's a piece of software or local configuration file that \
> > > is missing or screwed. 
> > > add:
> > > 
> > > - or the resource's local configuration,
> > > - or the node's specific configuration ... are invalid.
> > > 
> > > > 6 The resource's configuration is invalid. fatal
> > > 
> > > I believe "instance configuration" to be more appropriate here,
> > > 
> > > replace with:
> > > 
> > > - the instance's configuration (common, shared, clusterwide resource \
> > >                 configuration) is invalid,
> > > - or the resource agent has detected a severe internal (programming,code) \
> > > error.
> > 
> > makes sense
> > 
> > > 
> > > 
> > > Regarding the mnemonics of the return codes...
> > > 
> > > > From your notes above, it seems the status definitions appear to be more \
> > > > related to the restart and blocking effect the HA supervisor has on \
> > > > resources, than what the current mnemonics attempt to describe as situation.
> > > 
> > > I am not sure it is such a good idea to attempt to combine a condition with the \
> > > condition's handling action in the process of defining states that are to be \
> > > reported to the supervisor.
> > 
> > Not sure I follow this...
> > 
> > > 
> > > > From what you provided as description, is it i.e. the supervisor's concern, \
> > > > and will the supervisor attempt anything to address the cause, or for that \
> > > > matter do anything different if it receives any of the following status: \
> > > > OCF_ERR_UNIMPLEMENTED, OCF_ERR_PERM, OCF_ERR_INSTALLED ?
> > > 
> > > Same question for OCF_ERR_ARGS and OCF_ERR_CONFIGURED ?
> > > 
> > > Now the problem starts when I want to describe a condition where a resource \
> > > needs an internal ( fixed name, not specified as resource parameter) file but \
> > > file is missing on one host and not on others. Which condition would you choose \
> > > ?
> > 
> > OCF_ERR_ARGS i guess - since that would exclude the failed node but not the \
> > others.
> 
> oops, args doesn't do this.
> probably OCF_ERR_INSTALLED then.  or maybe one of OCF_ERR_ARGS and
> OCF_ERR_CONFIGURED needs to be made fatal.

brain not working today... of course I meant "hard".  and having
looked at everything again, i think this is the right approach.
So from now on OCF_ERR_ARGS will be a "hard" error instead of a "fatal" one.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic