[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    Help! Problems with IP address takeover on 2.0.36
From:       alanr () bell-labs ! com
Date:       1998-11-30 1:12:59
[Download RAW message or body]

I've been experimenting with Horms' IP address takeover code on the
2.0.36 kernel, and have largely done a preliminary integration with my
heartbeat code.  However, I've run into some strange behavior that has
me stumped.  I suspect I'm doing something wrong, but it's not
impossible that it's some kind of networking bug, or anomaly.

I have two machines, kathy and ken3.  They have IP addresses kathy,
kathy-adm, ken3, and ken3-adm.  The -adm addresses are the "real" IP
addresses, and the others are aliases which move around the network as
things go up and down.

Here's the scenario:
	Kathy goes out of service
	Ken3 notices the lack of heartbeat, and takes over for kathy
		(all is well)
	Kathy comes back into service, and asks for its IP address back
	Ken3 gives the IP address back to kathy

The takeover procedure is like this:
	Add an alias
	Add a route for the alias
	Send out a number of gratuitous ARPs

The give-back procedure is similar:
	remove the route for the alias
	ifconfig down the alias

The behavior I see is that if I'm on ken3 and start a ping to kathy,
then I see a 10-second dropout as kathy's demise is being detected, then
things take off again on the new MAC address.  This is great!  Now, when
ken3 gives back the IP address to kathy, the ping on ken3 (to kathy)
hangs.  New pings started now also hang.  If I remove the default route,
it takes off, and all is well again.  If it's a finger that hangs, it
doesn't recover after the route change.

More detail:  If I stop the ping before returning the IP address to
kathy, then subsequent pings, etc. all work perfectly.

In other words, all is well if I have no connections to kathy when it
migrates back to its proper place.  If I have a connection up at the
time the return migration occurs, then everything hangs until I diddle
the route.  Unlike ping, finger never recovers once it hangs.  Telnets
started while it was hung start up and connect to the real kathy when
things unfreeze after the route change.

What am I doing wrong?

	Thanks!!

	-- Alan Robertson
	   alanr@bell-labs.com

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic