[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    Re: [Linux-HA] crm_resource -P strange behavior.
From:       Simone Gotti <simone.gotti () email ! it>
Date:       2006-05-26 9:09:17
Message-ID: 1148634557.2658.38.camel () localhost
[Download RAW message or body]

On Fri, 2006-05-26 at 10:29 +0200, Andrew Beekhof wrote:
> 
> On May 26, 2006, at 10:24 AM, Simone Gotti wrote:
> 
> > Hi Andrew,
> > 
> > 
> > On Fri, 2006-05-26 at 09:22 +0200, Andrew Beekhof wrote:
> > > 
> > > 
> > > On May 25, 2006, at 8:07 PM, Simone Gotti wrote:
> > > 
> > > 
> > > > Hi, (I wrongly sent the same mail to linux-ha-dev, sorry...)
> > > 
> > > 
> > > 
> > > 
> > > np
> > > 
> > > 
> > > 
> > > 
> > > > I was testing the heartbeat behavoir with some tests.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > When I launched the crm_resource -P to probe for some resources
> > > > started
> > > > outside of the CRM (also if no one was started out of crm) I
> > > > noticed
> > > > that the cluster failed the probe trying it for an infinite
> > > > number
> > > > of
> > > > times. Looks like the OCF IPaddr monitor function returned with
> > > > a
> > > > ret code of 0
> > > > (OCF_SUCCESS, right as the resource is up) but tengine doesn't
> > > > like
> > > > it
> > > > as it's expecting a value of 7 (OCF_NOT_RUNNING):
> > > > 
> > > > 
> > > > 
> > > > 
> > > > tengine[10558]: 2006/05/25_16:32:38 ERROR:
> > > > mask(events.c:match_graph_event): Action ipaddr01_monitor_0 on
> > > > nodo01
> > > > failed (target: 7 vs. rc: 0): Error
> > > 
> > > 
> > > 
> > > 
> > > thats the probe detecting an active resource.
> > 
> > 
> > But why is it expecting a target of 7 also if heartbeat knows that
> > is
> > should be up on this node (see also below)?
> 
> 
> it doesn't... the fact that you're telling it to re-check means that
> all bets are off and what it thinks the cluster looks like is likely
> to be wrong.
> 

Ok thanks, so my assumption was wrong.

As I'd be glad to help you improving the wiki I've got some more
questions:

What's should be the hipotetical behavior when crm_resource -P is
launched and it founds the resource up on only one node? 
And when it's up on 2 or more node and it's a simple primitive (no clone
or master_slave) 
> > 
> > 
> > 
> > 
> > > > This happened also with other resource types not only with
> > > > IPaddr
> > > > 
> > > > 
> > > > 
> > > > 
> > > > I attached the ha-debug log file, with the cluster just started
> > > > only
> > > > on
> > > > one node (this happens also at least with 2 nodes), and then at
> > > > 2006/05/25_17:08:19 I launched the crm_resource -P command; the
> > > > cibadmin
> > > > -Q after the command, the ondisk cib.xml and the ha.cf.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > The unique way I found to stop this to happen is an heartbeat
> > > > restart.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Is this the right behavior, or am I doing something wrong?
> > > 
> > > 
> > > 
> > > 
> > > if you're expecting resources to be detected as active then its
> > > normal
> > > 
> > > 
> > > 
> > > 
> > > one could argue that this should be a warning but the most common
> > > case
> > > for this log is when resources are active at startup which is more
> > > important.
> > 
> > 
> > Probably I'm missing something, but what I don't understand is why
> > it's
> > going in an infinite and unstoppable loop when it detects the
> > resource
> > up.
> > 
> 
> 
> oooooh, yes you're right.  it will do that.  sigh, thats a pretty
> serious bug too.
> i'll work on it today (with the help of your comprehensive initial
> email)

Let me know if I can help you in any way.
> 
> > My assumption was that it detects the resource up, like it should be
> > as the cluster started it on this node, and then he's happy and the
> > probe is successfull.
> > 
> > 
> > On the other side, if, for example, someone started by hand a
> > resource
> > on another node, than it should detect it up, but it should be down
> > and
> > then heartbeat handle this situation stopping it. (I know that this
> > should never happen, but this is happening in the real world as
> > there
> > are operators that don't know how the environment is configured, and
> > other clusters suites, handle this possibility).
> > 

Another point that it's a bit related to this problems, is that I
noticed that the monitor is launched only on the nodes where the
resources should be up. and not on all the clusters' nodes, so if
someone erroneously activate by hand a resource on another node this is
not discovered by the cluster. (Maybe this was already discussed and I
know that this should never happen, but this is happening in the real
world as there are operators that don't know how the environment is
configured, and other clusters suites, can handle this possibility).
> > > 

> > > 
> > > we could assume that the resources are already active, but then
> > > you'd
> > > see similar log messages for the resources that weren't running.

> > 
> > > 
Bye!
> > > 

> > > 
> > > Andrew Beekhof
> > > 
> > > 
> > > 
> > > 
> > > "Would the last person to leave please turn out the
> > > enlightenment?" -
> > > TISM
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > 
> > 
> > 
> > 
> > 
> > 
> > --
> > Email.it, the professional e-mail, gratis per te:
> > http://www.email.it/f
> > 
> > 
> > Sponsor:
> > Corso multimediale sul Controllo di Gestione: impara facilmente
> > come ridurre i costi e aumentare gli utili della tua attivita'
> > Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=5055&d=26-5
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> 
> --
> 
> Andrew Beekhof
> 
> 
> "Would the last person to leave please turn out the enlightenment?" -
> TISM
> 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

 
 
 --
 Email.it, the professional e-mail, gratis per te: http://www.email.it/f
 
 Sponsor:
 Ti piace la chitarra? Impara a suonarla senza fatica ed evitando tutti gli errori, \
con l'aiuto di un maestro professionista  Clicca qui: \
http://adv.email.it/cgi-bin/foclick.cgi?mid=5144&d=26-5 \
_______________________________________________ Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic