[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] More on ipfail bug
From:       Kevin Dwyer <kevin () pheared ! net>
Date:       2003-05-15 13:21:23
[Download RAW message or body]

On Thu, 15 May 2003 05:08:10 -0600
Alan Robertson <alanr@unix.sh> wrote:

> Rafal Lewczuk wrote:
> >>Inserting sleep(3) at the beginning of msg_ping_nodes() resulted in
> >>ipfail working correctly (not doing failover when both nodes die).
> > 
> > 
> > First I meant I have to wait one heartbeat tickle (assuming one
> > second, that after that both nodes will know that ping node is dead.
> > But after waiting one second it was still racing (however, it was
> > less likely that we'll have unneeded failover), two seconds were
> > also a bit too short: it sometimes switched (rare, but still...).
> 
> Presumably waiting one "heartbeat interval" + 1 second should allow
> this to work correctly.  The fix of something as simple as a "sleep"
> makes me a little nervous, I confess...  I tend to not like to block
> in this kind of event-driven program.

Yes, the drawback to sleep()ing here is that once you've slept "long
enough" various things that you thought were true may no longer be true.
Also, if both sides get into this sleep, you'll still have the same
problem, just delayed by the sleep time.

The best solution is to do what Alan and I have discussed which is to
keep ipfail from acting until a period has passed, and things haven't
changed which would make an action unnecessary.  It's not necessary to
block the program during this time, and not really advisable.

> We've talked about this on the list before.  Kevin has been preparing
> for final exams in college, and has just finished them today.  Perhaps
> he'll have time to look at this after he's unwound a little from
> finals ;-)

"Hint, Hint" :)

I think I may poke at it tonight and see if I can come up with at least
a rough sketch of what should be happening.


-- 
/* kevin@pheared.net               http://pheared.net/devel/ */
/* Network Security Engineer       http://pheared.net/~kevin */
/* Sabotage will set us free.   Throw a rock in the machine. */
/*   >++++++++++[<++++++++++>-]<.+++++.----.[-]++++++++++.   */
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.community.tummy.com
http://lists.community.tummy.com/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic