[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] Re: Initial resource takeover problems in
From:       Alan Robertson <alanr () bell-labs ! com>
Date:       1999-11-09 5:50:49
[Download RAW message or body]

Thomas Hepper wrote:
> 
> Hi,
> On Fri, Nov 05, 1999 at 11:50:09PM -0700, Alan Robertson wrote:
> > Several different people have reported problems with initial resource takeover
> > in heartbeat 0.4.5a.
> >
> > I tried to reproduce it here, and I could -- on one machine.  When I recompiled
> > it from source, it seemed to go away.  One of the people reporting it seemed to
> > have the same experience.
> >
> > I have added a little debug to the code, fixed a problem with logging from shell
> > scripts, and now call it 0.4.5b.
> >
> > It's now pointed to by the download page.
> >
> > Please let me know what you find.  I would encourage anyone who is willing to
> > try the RPM version first.
> 
> OK gave it a try, and no luck (debian system). So i added some debugging
> to find the place where it fails. It seems that that command which is
> run by req_our_resources does not respond in time. I changed the fgets
> loop to retry the read more than once if the first read fails, waiting 1 second
> after every failed read, and it works .....
> I have no idea why the first read fail ..., errno is set to 4.

Errno 4 is EINTR.  This process has an alarm running, and it's success or
failure probably depends on where in the alarm cycle it occurs.  A good bit of
my testing isn't on a real cluster, so this means I never hear from any other
machines -- so my test cases are synchronized to the alarm code, so I almost
always have a full second before the SIGALRM goes off.

This sounds like a great find!

	Thanks Thomas!

	-- Alan Robertson
	   alanr@bell-labs.com

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.tummy.com
http://lists.tummy.com/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic