[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    Re: [Linux-HA] Re: Error when testing DRBD with Heartbeat
From:       Fabrice Durand <durand.fabrice () gmail ! com>
Date:       2005-08-31 16:44:58
Message-ID: 12afb69905083109447dcf3094 () mail ! gmail ! com
[Download RAW message or body]

On 8/31/05, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2005-08-31T16:29:47, Fabrice Durand <durand.fabrice@gmail.com> wrote:
> 
> > OK, so in fact my aim is test the high availibity of the resources, ie
> > to see what is happening when suddenly there is a "violent" kill of
> > heartbeat
> > (NOT a proper stop with the command /etc/init.d/heartbeat stop).
> > I don't expect to see a mess, but the active node (node 1) to give up
> > its resources
> > and the other node (node 2) to take them over and to get active, which
> > is exactly what is happening.
> > The only pb is that at the end the Heartbeat log on node 2, one can
> > see this message :
> > **********
> > heartbeat: 2005/07/29_15:21:04 ERROR: send_cluster_msg: cannot open
> > /var/lib/heartbeat/fifo: No such device or address
> > *********
> > And I wonder if there is something to parameter so that Heartbeat can
> > open this fifo file.
> > As said in the fisrt mail, I changed the rights of the file for this,
> > but this changes nothing...
> 
> Just to clarify: So you are seeing problems on the second node _after_
> the takeover, not on the first one? That would indeed be a bug.
> 
> Which heartbeat version again?

Sorry the pb is on node 1 (and not on node 2, it's a mistake in my
last post), as specified in my first my showing the heartbeat log on
node 1, the last line showing the pb :

***************************************
heartbeat: 2005/07/29_15:21:03 info: Heartbeat shutdown in progress. (1416)
heartbeat: 2005/07/29_15:21:03 info: Giving up all HA resources.
heartbeat: 2005/07/29_15:21:03 info: Core process 1419 exited. 7 remaining
heartbeat: 2005/07/29_15:21:03 info: Core process 1420 exited. 6 remaining
heartbeat: 2005/07/29_15:21:03 info: Core process 1421 exited. 5 remaining
heartbeat: 2005/07/29_15:21:03 info: Core process 1422 exited. 4 remaining
heartbeat: 2005/07/29_15:21:03 info: Core process 1423 exited. 3 remaining
heartbeat: 2005/07/29_15:21:03 info: Core process 1424 exited. 2 remaining
heartbeat: 2005/07/29_15:21:03 info: Core process 1425 exited. 1 remaining
heartbeat: 2005/07/29_15:21:03 info: Heartbeat shutdown complete.
heartbeat: 2005/07/29_15:21:04 info: Releasing resource group: eepclu1
135.9.216.51 drbddisk Filesystem::/dev/drbd0::/montagedrbd::ext3::
wu-ftpd
heartbeat: 2005/07/29_15:21:04 info: Running /etc/ha.d/resource.d/wu-ftpd  stop
heartbeat: 2005/07/29_15:21:04 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /montagedrbd ext3  stop
heartbeat: 2005/07/29_15:21:04 info: Running /etc/ha.d/resource.d/drbddisk  stop
heartbeat: 2005/07/29_15:21:04 info: Running
/etc/ha.d/resource.d/IPaddr 135.9.216.51 stop
heartbeat: 2005/07/29_15:21:04 info: /sbin/route -n del -host 135.9.216.51
heartbeat: 2005/07/29_15:21:04 info: /sbin/ifconfig eth0:0 down
heartbeat: 2005/07/29_15:21:04 info: IP Address 135.9.216.51 released
heartbeat: 2005/07/29_15:21:04 info: killing /usr/lib/heartbeat/ipfail
process group 1430 with signal 15
heartbeat: 2005/07/29_15:21:04 info: All HA resources relinquished.
heartbeat: 2005/07/29_15:21:04 ERROR: send_cluster_msg: cannot open
/var/lib/heartbeat/fifo: No such device or address
************************************
The Heartbeat version I use is heartbeat_1.2.3-1woody_i386.
Thanks !


> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic