[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    Re: [Linux-HA] heartbeat 1.2.3 / linux 2.6.10 / ping-ipfail problem
From:       Andrew Fritz <afritz () uh ! edu>
Date:       2005-02-28 19:18:27
Message-ID: 42236E83.1000208 () uh ! edu
[Download RAW message or body]

I had a similar problem (although, not exactly the same) and I haven't 
been able to resolve it (need to set aside some time to get my system to 
generate core files). Going to the 1.99 series fixed the problem though.

Andrew

max deli wrote:

> I have heartbeat working fine on a few boxes, up until I try to 
> implement ipfail.  If I comment out the bottom three lines of ha.cf it 
> works great. x.y.126.1 is pingable from both nodes.  With the config 
> as below, the alias never comes up on either side. Any suggestions 
> would be appreciated. thanks!
>
>
> ha.cf: (identical on both nodes)
> --------------------------------
> baud    19200
> serial    /dev/ttyS0
>
> auto_failback off
>
> node    zzz
> node    drrr
>
> ping x.y.126.1
> respawn root /usr/lib/heartbeat/ipfail
> apiauth ipfail uid=root
>
>
> haresources: (identical on both nodes)
> --------------------------------------
> zzz x.y.126.48
>
>
> authkeys: (identical on both nodes)
> -----------------------------------
> auth 1
> 1 crc
>
>
> logs from both nodes (drrr then zzz)
> ------------------------------------
> drrr: (zzz information follows)
>
> eth0      Link encap:Ethernet  HWaddr 00:D0:59:CD:55:82
>           inet addr:x.y.126.243  Bcast:x.y.126.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:6364 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:7156 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:137 txqueuelen:1000
>           RX bytes:4196811 (4.0 Mb)  TX bytes:651241 (635.9 Kb)
>
> ::::::::::::::
> /var/log/ha-debug
> ::::::::::::::
> heartbeat: 2005/03/03_00:22:51 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/03/03_00:23:17 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/03/03_00:23:20 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/03/03_00:23:30 debug: Starting 
> /etc/ha.d/resource.d/IPaddr x.y.126.48 stop
> heartbeat: 2005/03/03_00:23:30 debug: /etc/ha.d/resource.d/IPaddr 
> x.y.126.48 stop done. RC=0
> heartbeat: 2005/03/03_00:24:04 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> ::::::::::::::
> /var/log/ha-log
> ::::::::::::::
> heartbeat: 2005/03/03_00:22:47 info: Neither logfile nor logfacility 
> found.
> heartbeat: 2005/03/03_00:22:47 info: Logging defaulting to 
> /var/log/ha-log
> heartbeat: 2005/03/03_00:22:47 info: **************************
> heartbeat: 2005/03/03_00:22:47 info: Configuration validated. Starting 
> heartbeat 1.2.3
> heartbeat: 2005/03/03_00:22:47 info: heartbeat: version 1.2.3
> heartbeat: 2005/03/03_00:22:47 info: Heartbeat generation: 69
> heartbeat: 2005/03/03_00:22:47 info: Starting serial heartbeat on tty 
> /dev/ttyS0 (19200 baud)
> heartbeat: 2005/03/03_00:22:47 info: ping heartbeat started.
> heartbeat: 2005/03/03_00:22:47 info: pid 2402 locked in memory.
> heartbeat: 2005/03/03_00:22:47 info: Local status now set to: 'up'
> heartbeat: 2005/03/03_00:22:48 info: pid 2404 locked in memory.
> heartbeat: 2005/03/03_00:22:48 info: pid 2405 locked in memory.
> heartbeat: 2005/03/03_00:22:48 info: pid 2406 locked in memory.
> heartbeat: 2005/03/03_00:22:48 info: pid 2408 locked in memory.
> heartbeat: 2005/03/03_00:22:48 info: pid 2407 locked in memory.
> heartbeat: 2005/03/03_00:22:48 ERROR: Exiting HBWRITE process 2407 
> killed by signal 11.
> heartbeat: 2005/03/03_00:22:48 ERROR: Core heartbeat process died! 
> Restarting.
> heartbeat: 2005/03/03_00:22:48 WARN: Shutdown delayed until current 
> resource activity finishes.
> heartbeat: 2005/03/03_00:22:51 info: Link zzz:/dev/ttyS0 up.
> heartbeat: 2005/03/03_00:22:51 info: Status update for node zzz: 
> status up
> heartbeat: 2005/03/03_00:22:51 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/03/03_00:23:17 WARN: node x.y.126.1: is dead
> heartbeat: 2005/03/03_00:23:17 info: Local status now set to: 'active'
> heartbeat: 2005/03/03_00:23:17 info: Starting child client 
> "/usr/lib/heartbeat/ipfail" (0,0)
> heartbeat: 2005/03/03_00:23:17 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/03/03_00:23:17 info: Starting 
> "/usr/lib/heartbeat/ipfail" as uid 0  gid 0 (pid 2414)
> heartbeat: 2005/03/03_00:23:20 info: Status update for node zzz: 
> status active
> heartbeat: 2005/03/03_00:23:20 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/03/03_00:23:30 info: remote resource transition 
> completed.
> heartbeat: 2005/03/03_00:23:30 info: remote resource transition 
> completed.
> heartbeat: 2005/03/03_00:23:30 info: Initial resource acquisition 
> complete (T_RESOURCES(us))
> heartbeat: 2005/03/03_00:23:30 info: No local resources 
> [/usr/lib/heartbeat/ResourceManager listkeys drrr] to acquire.
> heartbeat: 2005/03/03_00:23:30 info: Heartbeat shutdown in progress. 
> (2402)
> heartbeat: 2005/03/03_00:23:30 info: Giving up all HA resources.
> heartbeat: 2005/03/03_00:23:30 info: Releasing resource group: zzz 
> x.y.126.48
> heartbeat: 2005/03/03_00:23:30 info: Running 
> /etc/ha.d/resource.d/IPaddr x.y.126.48 stop
> heartbeat: 2005/03/03_00:23:30 info: killing /usr/lib/heartbeat/ipfail 
> process group 2414 with signal 15
> heartbeat: 2005/03/03_00:23:30 info: killing heartbeat resource child 
> process group 2424 with signal 9
> heartbeat: 2005/03/03_00:23:30 info: All HA resources relinquished.
> heartbeat: 2005/03/03_00:23:30 info: killing /usr/lib/heartbeat/ipfail 
> process group 2414 with signal 15
> heartbeat: 2005/03/03_00:23:31 WARN: 1 lost packet(s) for [zzz] [49:51]
> heartbeat: 2005/03/03_00:23:31 info: No pkts missing from zzz!
> heartbeat: 2005/03/03_00:23:31 info: remote resource transition 
> completed.
> heartbeat: 2005/03/03_00:23:31 info: Received shutdown notice from 'zzz'.
> heartbeat: 2005/03/03_00:23:31 info: Resource takeover cancelled - 
> shutdown in progress.
> heartbeat: 2005/03/03_00:23:31 info: killing HBFIFO process 2404 with 
> signal 15
> heartbeat: 2005/03/03_00:23:31 info: killing HBWRITE process 2405 with 
> signal 15
> heartbeat: 2005/03/03_00:23:31 info: killing HBREAD process 2406 with 
> signal 15
> heartbeat: 2005/03/03_00:23:31 info: killing HBREAD process 2408 with 
> signal 15
> heartbeat: 2005/03/03_00:23:31 info: Core process 2404 exited. 4 
> remaining
> heartbeat: 2005/03/03_00:23:31 info: Core process 2405 exited. 3 
> remaining
> heartbeat: 2005/03/03_00:23:31 info: Core process 2406 exited. 2 
> remaining
> heartbeat: 2005/03/03_00:23:31 info: Core process 2408 exited. 1 
> remaining
> heartbeat: 2005/03/03_00:23:31 info: Heartbeat shutdown complete.
> heartbeat: 2005/03/03_00:23:31 info: Heartbeat restart triggered.
> heartbeat: 2005/03/03_00:23:31 info: Restarting heartbeat.
> heartbeat: 2005/03/03_00:23:31 info: Performing heartbeat restart exec.
> heartbeat: 2005/03/03_00:24:02 info: Neither logfile nor logfacility 
> found.
> heartbeat: 2005/03/03_00:24:02 info: Logging defaulting to 
> /var/log/ha-log
> heartbeat: 2005/03/03_00:24:02 info: **************************
> heartbeat: 2005/03/03_00:24:02 info: Configuration validated. Starting 
> heartbeat 1.2.3
> heartbeat: 2005/03/03_00:24:02 info: heartbeat: version 1.2.3
> heartbeat: 2005/03/03_00:24:02 info: Heartbeat generation: 70
> heartbeat: 2005/03/03_00:24:03 info: Starting serial heartbeat on tty 
> /dev/ttyS0 (19200 baud)
> heartbeat: 2005/03/03_00:24:03 info: ping heartbeat started.
> heartbeat: 2005/03/03_00:24:03 info: pid 2497 locked in memory.
> heartbeat: 2005/03/03_00:24:03 info: Local status now set to: 'up'
> heartbeat: 2005/03/03_00:24:04 info: pid 2499 locked in memory.
> heartbeat: 2005/03/03_00:24:04 info: pid 2500 locked in memory.
> heartbeat: 2005/03/03_00:24:04 info: pid 2501 locked in memory.
> heartbeat: 2005/03/03_00:24:04 info: pid 2502 locked in memory.
> heartbeat: 2005/03/03_00:24:04 ERROR: Exiting HBWRITE process 2502 
> killed by signal 11.
> heartbeat: 2005/03/03_00:24:04 ERROR: Core heartbeat process died! 
> Restarting.
> heartbeat: 2005/03/03_00:24:04 WARN: Shutdown delayed until current 
> resource activity finishes.
> heartbeat: 2005/03/03_00:24:04 info: pid 2503 locked in memory.
> heartbeat: 2005/03/03_00:24:04 info: Link zzz:/dev/ttyS0 up.
> heartbeat: 2005/03/03_00:24:04 info: Status update for node zzz: 
> status up
> heartbeat: 2005/03/03_00:24:04 info: Running /etc/ha.d/rc.d/status status
>
> zzz:
>
> eth0      Link encap:Ethernet  HWaddr 00:30:1B:AB:55:05
>           inet addr:x.y.126.246  Bcast:x.y.126.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:136353 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:135225 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:2095 txqueuelen:1000
>           RX bytes:97219192 (92.7 Mb)  TX bytes:81856794 (78.0 Mb)
>           Interrupt:12 Base address:0x6000
>
> ::::::::::::::
> /var/log/ha-debug
> ::::::::::::::
> heartbeat: 2005/02/28_13:03:40 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/02/28_13:04:07 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/02/28_13:04:09 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/02/28_13:04:20 debug: StartNextRemoteRscReq(): child 
> count 1
> heartbeat: 2005/02/28_13:04:20 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/02/28_13:04:20 debug: Starting 
> /etc/ha.d/resource.d/IPaddr x.y.126.48 start
> ls: /var/lib/heartbeat/rsctmp/IPaddr/eth0:*: No such file or directory
> heartbeat: 2005/02/28_13:04:20 debug: /etc/ha.d/resource.d/IPaddr 
> x.y.126.48 start done. RC=0
> heartbeat: 2005/02/28_13:04:20 debug: Starting 
> /etc/ha.d/resource.d/IPaddr x.y.126.48 stop
> SIOCDELRT: No such process
> heartbeat: 2005/02/28_13:04:20 debug: /etc/ha.d/resource.d/IPaddr 
> x.y.126.48 stop done. RC=0
> heartbeat: 2005/02/28_13:04:54 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/02/28_13:05:23 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat: 2005/02/28_13:05:23 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> ::::::::::::::
> /var/log/ha-log
> ::::::::::::::
> heartbeat: 2005/02/28_13:03:39 info: Neither logfile nor logfacility 
> found.
> heartbeat: 2005/02/28_13:03:39 info: Logging defaulting to 
> /var/log/ha-log
> heartbeat: 2005/02/28_13:03:39 info: **************************
> heartbeat: 2005/02/28_13:03:39 info: Configuration validated. Starting 
> heartbeat 1.2.3
> heartbeat: 2005/02/28_13:03:39 info: heartbeat: version 1.2.3
> heartbeat: 2005/02/28_13:03:39 info: Heartbeat generation: 62
> heartbeat: 2005/02/28_13:03:39 info: Starting serial heartbeat on tty 
> /dev/ttyS0 (19200 baud)
> heartbeat: 2005/02/28_13:03:39 info: ping heartbeat started.
> heartbeat: 2005/02/28_13:03:39 info: pid 14650 locked in memory.
> heartbeat: 2005/02/28_13:03:39 info: Local status now set to: 'up'
> heartbeat: 2005/02/28_13:03:40 info: pid 14652 locked in memory.
> heartbeat: 2005/02/28_13:03:40 info: pid 14653 locked in memory.
> heartbeat: 2005/02/28_13:03:40 info: pid 14654 locked in memory.
> heartbeat: 2005/02/28_13:03:40 info: Link drrr:/dev/ttyS0 up.
> heartbeat: 2005/02/28_13:03:40 info: Status update for node drrr: 
> status up
> heartbeat: 2005/02/28_13:03:40 info: pid 14655 locked in memory.
> heartbeat: 2005/02/28_13:03:40 ERROR: Exiting HBWRITE process 14655 
> killed by signal 11.
> heartbeat: 2005/02/28_13:03:40 ERROR: Core heartbeat process died! 
> Restarting.
> heartbeat: 2005/02/28_13:03:40 WARN: Shutdown delayed until current 
> resource activity finishes.
> heartbeat: 2005/02/28_13:03:40 info: pid 14656 locked in memory.
> heartbeat: 2005/02/28_13:03:40 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/02/28_13:04:07 info: Status update for node drrr: 
> status active
> heartbeat: 2005/02/28_13:04:07 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/02/28_13:04:09 WARN: node x.y.126.1: is dead
> heartbeat: 2005/02/28_13:04:09 info: Local status now set to: 'active'
> heartbeat: 2005/02/28_13:04:09 info: Starting child client 
> "/usr/lib/heartbeat/ipfail" (0,0)
> heartbeat: 2005/02/28_13:04:09 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/02/28_13:04:09 info: Starting 
> "/usr/lib/heartbeat/ipfail" as uid 0  gid 0 (pid 14665)
> heartbeat: 2005/02/28_13:04:20 info: local resource transition completed.
> heartbeat: 2005/02/28_13:04:20 info: Initial resource acquisition 
> complete (T_RESOURCES(us))
> heartbeat: 2005/02/28_13:04:20 info: Local Resource acquisition 
> completed.
> heartbeat: 2005/02/28_13:04:20 info: Running 
> /etc/ha.d/rc.d/ip-request-resp ip-request-resp
> heartbeat: 2005/02/28_13:04:20 received ip-request-resp x.y.126.48 OK yes
> heartbeat: 2005/02/28_13:04:20 info: Acquiring resource group: zzz 
> x.y.126.48
> heartbeat: 2005/02/28_13:04:20 info: Running 
> /etc/ha.d/resource.d/IPaddr x.y.126.48 start
> heartbeat: 2005/02/28_13:04:20 info: /sbin/ifconfig eth0:0 x.y.126.48 
> netmask 255.255.255.0    broadcast x.y.126.255
> heartbeat: 2005/02/28_13:04:20 info: Sending Gratuitous Arp for 
> x.y.126.48 on eth0:0 [eth0]
> heartbeat: 2005/02/28_13:04:20 /usr/lib/heartbeat/send_arp -i 1010 -r 
> 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-x.y.126.48 eth0 
> x.y.126.48 auto x.y.126.48 ffffffffffff
> heartbeat: 2005/02/28_13:04:20 info: remote resource transition 
> completed.
> heartbeat: 2005/02/28_13:04:20 info: Heartbeat shutdown in progress. 
> (14650)
> heartbeat: 2005/02/28_13:04:20 info: Giving up all HA resources.
> heartbeat: 2005/02/28_13:04:20 info: Releasing resource group: zzz 
> x.y.126.48
> heartbeat: 2005/02/28_13:04:20 info: Running 
> /etc/ha.d/resource.d/IPaddr x.y.126.48 stop
> heartbeat: 2005/02/28_13:04:20 info: /sbin/route -n del -host x.y.126.48
> heartbeat: 2005/02/28_13:04:20 info: /sbin/ifconfig eth0:0 down
> heartbeat: 2005/02/28_13:04:20 info: IP Address x.y.126.48 released
> heartbeat: 2005/02/28_13:04:20 info: killing /usr/lib/heartbeat/ipfail 
> process group 14665 with signal 15
> heartbeat: 2005/02/28_13:04:20 info: All HA resources relinquished.
> heartbeat: 2005/02/28_13:04:20 info: Received shutdown notice from 
> 'drrr'.
> heartbeat: 2005/02/28_13:04:20 info: Resource takeover cancelled - 
> shutdown in progress.
> heartbeat: 2005/02/28_13:04:21 info: killing HBFIFO process 14652 with 
> signal 15
> heartbeat: 2005/02/28_13:04:21 info: killing HBWRITE process 14653 
> with signal 15
> heartbeat: 2005/02/28_13:04:21 info: killing HBREAD process 14654 with 
> signal 15
> heartbeat: 2005/02/28_13:04:21 info: killing HBREAD process 14656 with 
> signal 15
> heartbeat: 2005/02/28_13:04:21 info: Core process 14652 exited. 4 
> remaining
> heartbeat: 2005/02/28_13:04:21 info: Core process 14653 exited. 3 
> remaining
> heartbeat: 2005/02/28_13:04:21 info: Core process 14654 exited. 2 
> remaining
> heartbeat: 2005/02/28_13:04:21 info: Core process 14656 exited. 1 
> remaining
> heartbeat: 2005/02/28_13:04:21 info: Heartbeat shutdown complete.
> heartbeat: 2005/02/28_13:04:21 info: Heartbeat restart triggered.
> heartbeat: 2005/02/28_13:04:21 info: Restarting heartbeat.
> heartbeat: 2005/02/28_13:04:21 info: Performing heartbeat restart exec.
> heartbeat: 2005/02/28_13:04:52 info: Neither logfile nor logfacility 
> found.
> heartbeat: 2005/02/28_13:04:52 info: Logging defaulting to 
> /var/log/ha-log
> heartbeat: 2005/02/28_13:04:52 info: **************************
> heartbeat: 2005/02/28_13:04:52 info: Configuration validated. Starting 
> heartbeat 1.2.3
> heartbeat: 2005/02/28_13:04:52 info: heartbeat: version 1.2.3
> heartbeat: 2005/02/28_13:04:52 info: Heartbeat generation: 63
> heartbeat: 2005/02/28_13:04:52 info: Starting serial heartbeat on tty 
> /dev/ttyS0 (19200 baud)
> heartbeat: 2005/02/28_13:04:52 info: ping heartbeat started.
> heartbeat: 2005/02/28_13:04:52 info: pid 14895 locked in memory.
> heartbeat: 2005/02/28_13:04:52 info: Local status now set to: 'up'
> heartbeat: 2005/02/28_13:04:53 info: pid 14897 locked in memory.
> heartbeat: 2005/02/28_13:04:53 info: pid 14898 locked in memory.
> heartbeat: 2005/02/28_13:04:54 info: pid 14899 locked in memory.
> heartbeat: 2005/02/28_13:04:54 info: Link drrr:/dev/ttyS0 up.
> heartbeat: 2005/02/28_13:04:54 info: Status update for node drrr: 
> status up
> heartbeat: 2005/02/28_13:04:54 info: pid 14900 locked in memory.
> heartbeat: 2005/02/28_13:04:54 ERROR: Exiting HBWRITE process 14900 
> killed by signal 11.
> heartbeat: 2005/02/28_13:04:54 ERROR: Core heartbeat process died! 
> Restarting.
> heartbeat: 2005/02/28_13:04:54 WARN: Shutdown delayed until current 
> resource activity finishes.
> heartbeat: 2005/02/28_13:04:54 info: pid 14901 locked in memory.
> heartbeat: 2005/02/28_13:04:54 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/02/28_13:05:23 info: Status update for node drrr: 
> status active
> heartbeat: 2005/02/28_13:05:23 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/02/28_13:05:23 WARN: node x.y.126.1: is dead
> heartbeat: 2005/02/28_13:05:23 info: Local status now set to: 'active'
> heartbeat: 2005/02/28_13:05:23 info: Starting child client 
> "/usr/lib/heartbeat/ipfail" (0,0)
> heartbeat: 2005/02/28_13:05:23 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/02/28_13:05:23 info: Starting 
> "/usr/lib/heartbeat/ipfail" as uid 0  gid 0 (pid 14909)
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic