[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] heartbeat startup on master but not on slave does bad things
From:       "Luis Claudio R. Goncalves" <lclaudio () conectiva ! com ! br>
Date:       2003-08-25 19:50:30
[Download RAW message or body]

Hi!

This a dirty, tricky and brown paper bag worth patch. Besides of its
evilness, this patch seems to work fine...

I surely know this is not the best way to solve the problem, but this patch
is incredible simple and can lead to a different solution, easier than the
first one I thought about.

I'd recommend not using this patch in a production environment. But if the
problem described in this thread is boring you, try it out. The patch
applies cleanly against heartbeat-1.0.3. 

Remember, don't blame me! This patch is just a proof of concept. :)

[]'s
Luis

On Fri, Aug 22, 2003 at 06:27:02PM +0200, Lars Marowsky-Bree wrote:
| On 2003-08-15T15:49:45,
|    "Luis Claudio R. Goncalves" <lclaudio@conectiva.com.br> said:
| 
| > I believe it would'nt break the other cases where takeover_from_node()
| > is called. Any Ideas?
| 
| Hi Luis and Alan,
| 
| I have tried understanding the code in hb_resource.c and its
| dependencies within heartbeat.c, but process_resources() alone is 200
| lines for a single function, and I can't seem to understand the state
| machine within.
| 
| As you two have both much superior experience with this code than I,
| could you please look at this bug? I think it is kind of important.
| 
| It would probably take me more than one day to understand; if you don't
| find the time, please drop me a mail and I'll get started, but I'd like
| to avoid that ;-) (And Alan, I _will_ call you and ask tons of stupid
| questions then! ;-)
| 
| 
| Sincerely,
|     Lars Marowsky-Brée <lmb@suse.de>
| 
| -- 
| High Availability & Clustering		ever tried. ever failed. no matter.
| SuSE Labs				try again. fail again. fail better.
| Research & Development, SuSE Linux AG		-- Samuel Beckett
| 


---end quoted text---

-- 
[ Luis Claudio R. Goncalves                  lclaudio@conectiva.com.br ]
[ Fingerprint:   4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8    ]
[ Msc has come!!!! - Conectiva HA Team - Gospel User - Linuxer - !Java ]
[ Fault Tolerance - Real-Time - Distributed Systems - IECLB - IS 40:31 ]
[ LateNite Programmer        --  My Utmost for His Highest  --         ]


["heartbeat-1.0.3-hb_resources.patch" (text/plain)]

--- heartbeat-1.0.3/heartbeat/hb_resource.c	2003-06-16 02:51:14.000000000 -0300
+++ /tmp/hb_resource-new.c	2003-08-25 16:39:14.000000000 -0300
@@ -453,8 +453,12 @@
 	if (!nice_failback) {
 		/* Original ("normal") starting behavior */
 		if (!WeAreRestarting && !resources_requested_yet) {
-			resources_requested_yet=1;
-			req_our_resources(FALSE);
+			if (!takeover_in_progress) {
+				resources_requested_yet=1;
+				req_our_resources(FALSE);
+			} else {
+				takeover_in_progress = 0;
+			}
 		}
 		return;
 	}
@@ -866,7 +870,6 @@
 
 			other_holds_resources = HB_NO_RSC;
 			other_is_stable = 1;	/* Not going anywhere */
-			takeover_in_progress = 1;
 			if (ANYDEBUG) {
 				ha_log(LOG_DEBUG
 				,	"takeover_from_node: other now stable");
@@ -880,6 +883,7 @@
 			/* case 1 - part 1 */
 			/* part 2 is done by the mach_down script... */
 		}
+		takeover_in_progress = 1;
 		req_our_resources(TRUE);
 		/* req_our_resources turns on the HB_LOCAL_RSC bit */
 

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic