[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openais
Subject:    [Openais] Ckpt Service Various Small Bug Fixes.
From:       "Muni Bajpai" <muniba () nortel ! com>
Date:       2005-05-12 19:58:28
Message-ID: CFCE7C3BDB79204092974B5B50AD719401FC42B0 () zrc2hxm0 ! corp ! nortel ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hey Steve,

I have attached a consolidated patch for quite a few minor fixes. Please
review.

Tested with the testckpt suite and ckpt-rd,wr suite in a 2 node config.

Thanks.

Muni
 


-----Original Message-----
From: Steven Dake [mailto:sdake@mvista.com] 
Sent: Tuesday, May 10, 2005 8:36 PM
To: Bajpai, Muni [NGC:B670:EXCH]
Cc: openais@lists.osdl.org; Smith, Kristen [NGC:B670:EXCH]
Subject: RE: Ckpt Recovery Bug


We have support for this operation in the sync architecture.

It is sync_abort and sync_process.  When recovery has completed,
sync_activate is called, which activates the synchronization (after we know
for sure synchronization has completed).  So it is not possible to do
synchronization "on the fly" but instead wait until the activate operation
occurs.  I'm not sure how well this architecture is reflected in any of the
services, though.

Regards
-steve

On Tue, 2005-05-10 at 18:16, Muni Bajpai wrote:
> I cant think of any cases. I did see something today where there was a 
> token loss in the middle of recovery and the barriers had not 
> completed and all the ckpt reference counts got reinitialized to bogus 
> values.
> 
> What is our story on mid recovery failures?
> 
> Thanks
> 
> Muni
> 
> -----Original Message-----
> From: Steven Dake [mailto:sdake@mvista.com]
> Sent: Tuesday, May 10, 2005 3:15 PM
> To: Bajpai, Muni [NGC:B670:EXCH]
> Cc: openais@lists.osdl.org; Smith, Kristen [NGC:B670:EXCH]
> Subject: RE: Ckpt Recovery Bug
> 
> On Tue, 2005-05-10 at 07:28, Muni Bajpai wrote:
> > Hey Steve,
> > 
> > I didn't think the max was the solution. It was more of a place 
> > holder. So can I do a sum instead as you noted ?
> > 
> 
> I think sum is ok..  Can you think of any cases where sum doesn't work 
> correctly?
> 
> regards
> -steve
> 
> > Thanks
> > 
> > Muni
> > 
> >  
> > 
> > 
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake@mvista.com]
> > Sent: Monday, May 09, 2005 6:51 PM
> > To: Bajpai, Muni [NGC:B670:EXCH]
> > Cc: openais@lists.osdl.org; Smith, Kristen [NGC:B670:EXCH]
> > Subject: Re: Ckpt Recovery Bug
> > 
> > 
> > Muni
> > 
> > I took a look at the patch.  I am not sure it is correct.  Simply 
> > assigning the reference count from two seperate partitions based
> upon
> > the max of those two values doesn't seem right.
> > 
> > Consider an example:
> > 
> > cX is a configuration pX is a processor
> > 
> > c1: p1, p2, p3 each processor accesses checkpoint Z (refcount of
> this
> > configuration is 3)
> > 
> > c2: p4, p5 each processor accesses checkpoint Z (refcount of this 
> > configuration is 2)
> > 
> > c1 and c2 merge and form c3.
> > 
> > Then given the algorithm in the patch, the example would lead to:
> > 
> > c3: p1, p2, p3, p4, p5 each accesses checkpoint Z (refcount is 3)
> > 
> > Shouldn't the refcount be 5 after the network merge since p1, p2,
> p3,
> > p4, p5 are referencing it?
> > 
> > I'll be happy to take a look at the ckpt_find_global patch you
> send...
> > 
> > regards
> > -steve
> > 
> > On Mon, 2005-05-09 at 14:29, Muni Bajpai wrote:
> > > Hey Steve,
> > > 
> > > Found an Issue with my recovery code while running traffic. We
> were
> > > not merging checkpoint states properly. In essence during recovery
> > the
> > > a processor receiving valid sync_state messages was essentially
> > > overwriting the local state taking the incoming network state as 
> > > gospel. Needless to say that was causing some problems.
> > > 
> > > Could you please look through the algorithm (specially the
> > > merge_ckpt_refcounts function.)I'm not too sure of the last 'else'
> > > statement.
> > > 
> > > I also found some issues with the implementation of the
> > > ckpt_find_global function that might be causing a lot of your saf
> > test
> > > core dumps. Will send that patch later.
> > > 
> > > Thanks
> > > 
> > > Muni
> > > 
> > >  
> > 
> > 
> 
> 




[Attachment #5 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2658.2">
<TITLE>Ckpt Service Various Small Bug Fixes.</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=2>Hey Steve,</FONT>
</P>

<P><FONT SIZE=2>I have attached a consolidated patch for quite a few minor fixes. \
Please review.</FONT> </P>

<P><FONT SIZE=2>Tested with the testckpt suite and ckpt-rd,wr suite in a 2 node \
config.</FONT> </P>

<P><FONT SIZE=2>Thanks.</FONT>
</P>

<P><FONT SIZE=2>Muni</FONT>
<BR><FONT SIZE=2>&nbsp;</FONT>
</P>
<BR>

<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: Steven Dake [<A \
HREF="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>] </FONT> <BR><FONT \
SIZE=2>Sent: Tuesday, May 10, 2005 8:36 PM</FONT> <BR><FONT SIZE=2>To: Bajpai, Muni \
[NGC:B670:EXCH]</FONT> <BR><FONT SIZE=2>Cc: openais@lists.osdl.org; Smith, Kristen \
[NGC:B670:EXCH]</FONT> <BR><FONT SIZE=2>Subject: RE: Ckpt Recovery Bug</FONT>
</P>
<BR>

<P><FONT SIZE=2>We have support for this operation in the sync architecture.</FONT>
</P>

<P><FONT SIZE=2>It is sync_abort and sync_process.&nbsp; When recovery has completed, \
sync_activate is called, which activates the synchronization (after we know for sure \
synchronization has completed).&nbsp; So it is not possible to do synchronization \
&quot;on the fly&quot; but instead wait until the activate operation occurs.&nbsp; \
I'm not sure how well this architecture is reflected in any of the services, \
though.</FONT></P>

<P><FONT SIZE=2>Regards</FONT>
<BR><FONT SIZE=2>-steve</FONT>
</P>

<P><FONT SIZE=2>On Tue, 2005-05-10 at 18:16, Muni Bajpai wrote:</FONT>
<BR><FONT SIZE=2>&gt; I cant think of any cases. I did see something today where \
there was a </FONT> <BR><FONT SIZE=2>&gt; token loss in the middle of recovery and \
the barriers had not </FONT> <BR><FONT SIZE=2>&gt; completed and all the ckpt \
reference counts got reinitialized to bogus </FONT> <BR><FONT SIZE=2>&gt; \
values.</FONT> <BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; What is our story on mid recovery failures?</FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; Thanks</FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; Muni</FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; -----Original Message-----</FONT>
<BR><FONT SIZE=2>&gt; From: Steven Dake [<A \
HREF="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT> <BR><FONT \
SIZE=2>&gt; Sent: Tuesday, May 10, 2005 3:15 PM</FONT> <BR><FONT SIZE=2>&gt; To: \
Bajpai, Muni [NGC:B670:EXCH]</FONT> <BR><FONT SIZE=2>&gt; Cc: openais@lists.osdl.org; \
Smith, Kristen [NGC:B670:EXCH]</FONT> <BR><FONT SIZE=2>&gt; Subject: RE: Ckpt \
Recovery Bug</FONT> <BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; On Tue, 2005-05-10 at 07:28, Muni Bajpai wrote:</FONT>
<BR><FONT SIZE=2>&gt; &gt; Hey Steve,</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; I didn't think the max was the solution. It was more of a \
place </FONT> <BR><FONT SIZE=2>&gt; &gt; holder. So can I do a sum instead as you \
noted ?</FONT> <BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; I think sum is ok..&nbsp; Can you think of any cases where sum \
doesn't work </FONT> <BR><FONT SIZE=2>&gt; correctly?</FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; regards</FONT>
<BR><FONT SIZE=2>&gt; -steve</FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; Thanks</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; Muni</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt;&nbsp; </FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; -----Original Message-----</FONT>
<BR><FONT SIZE=2>&gt; &gt; From: Steven Dake [<A \
HREF="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT> <BR><FONT \
SIZE=2>&gt; &gt; Sent: Monday, May 09, 2005 6:51 PM</FONT> <BR><FONT SIZE=2>&gt; &gt; \
To: Bajpai, Muni [NGC:B670:EXCH]</FONT> <BR><FONT SIZE=2>&gt; &gt; Cc: \
openais@lists.osdl.org; Smith, Kristen [NGC:B670:EXCH]</FONT> <BR><FONT SIZE=2>&gt; \
&gt; Subject: Re: Ckpt Recovery Bug</FONT> <BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; Muni</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; I took a look at the patch.&nbsp; I am not sure it is \
correct.&nbsp; Simply </FONT> <BR><FONT SIZE=2>&gt; &gt; assigning the reference \
count from two seperate partitions based</FONT> <BR><FONT SIZE=2>&gt; upon</FONT>
<BR><FONT SIZE=2>&gt; &gt; the max of those two values doesn't seem right.</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; Consider an example:</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; cX is a configuration pX is a processor</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; c1: p1, p2, p3 each processor accesses checkpoint Z \
(refcount of</FONT> <BR><FONT SIZE=2>&gt; this</FONT>
<BR><FONT SIZE=2>&gt; &gt; configuration is 3)</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; c2: p4, p5 each processor accesses checkpoint Z (refcount \
of this </FONT> <BR><FONT SIZE=2>&gt; &gt; configuration is 2)</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; c1 and c2 merge and form c3.</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; Then given the algorithm in the patch, the example would \
lead to:</FONT> <BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; c3: p1, p2, p3, p4, p5 each accesses checkpoint Z \
(refcount is 3)</FONT> <BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; Shouldn't the refcount be 5 after the network merge since \
p1, p2,</FONT> <BR><FONT SIZE=2>&gt; p3,</FONT>
<BR><FONT SIZE=2>&gt; &gt; p4, p5 are referencing it?</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; I'll be happy to take a look at the ckpt_find_global patch \
you</FONT> <BR><FONT SIZE=2>&gt; send...</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; regards</FONT>
<BR><FONT SIZE=2>&gt; &gt; -steve</FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; On Mon, 2005-05-09 at 14:29, Muni Bajpai wrote:</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; Hey Steve,</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; Found an Issue with my recovery code while running \
traffic. We</FONT> <BR><FONT SIZE=2>&gt; were</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; not merging checkpoint states properly. In essence \
during recovery</FONT> <BR><FONT SIZE=2>&gt; &gt; the</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; a processor receiving valid sync_state messages was \
essentially</FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; overwriting the local state taking \
the incoming network state as </FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; gospel. \
Needless to say that was causing some problems.</FONT> <BR><FONT SIZE=2>&gt; &gt; \
&gt; </FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; Could you please look through the \
algorithm (specially the</FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; merge_ckpt_refcounts \
function.)I'm not too sure of the last 'else'</FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; \
statement.</FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; I also found some issues with the implementation of \
the</FONT> <BR><FONT SIZE=2>&gt; &gt; &gt; ckpt_find_global function that might be \
causing a lot of your saf</FONT> <BR><FONT SIZE=2>&gt; &gt; test</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; core dumps. Will send that patch later.</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; Thanks</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; Muni</FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; &gt;&nbsp; </FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; &gt; </FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt; </FONT>
</P>
<BR>

<P><FONT FACE="Arial" SIZE=2 COLOR="#000000"></FONT>&nbsp;

</BODY>
</HTML>


["defect_consolidated.patch" (application/octet-stream)]

diff -uNr --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet --exclude=init \
--exclude=LICENSE --exclude=Makefile --exclude=man --exclude=README.devmap \
--exclude=SECURITY --exclude=TODO --exclude=CHANGELOG --exclude=conf --exclude=loc \
--exclude=Makefile.samples --exclude=QUICKSTART --exclude=.cdtproject \
                --exclude=.project --exclude=nortel.patch latest/exec/ckpt.c \
                openais/exec/ckpt.c
--- latest/exec/ckpt.c	2005-05-12 13:52:42 -05:00
+++ openais/exec/ckpt.c	2005-05-12 13:19:19 -05:00
@@ -6,7 +6,7 @@
  * Author: Steven Dake (sdake@mvista.com)
  *
  * This software licensed under BSD license, the text of which follows:
- *
+ * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
@@ -43,6 +43,7 @@
 #include <stdio.h>
 #include <errno.h>
 #include <signal.h>
+#include <arpa/inet.h>
 
 #include "../include/ais_types.h"
 #include "../include/saCkpt.h"
@@ -76,6 +77,8 @@
 	SYNCHRONY_STATE_ENDED
 }synchrony_state;
 
+static int process_localhost_transition = 0;
+
 /* TODO static totempg_recovery_plug_handle ckpt_checkpoint_recovery_plug_handle; */
 
 static int ckpt_exec_init_fn (void);
@@ -148,7 +151,9 @@
 static int  ckpt_recovery_process (void);
 static void ckpt_recovery_finalize();
 static void ckpt_recovery_abort(void);
-static void ckpt_recovery_process_members_exit(struct in_addr *left_list, int \
left_list_entries); +static void ckpt_recovery_process_members_exit(struct in_addr \
*left_list,  +						int left_list_entries); 
+static void ckpt_replace_localhost_ip (struct in_addr *joined_list);
 
 void checkpoint_release (struct saCkptCheckpoint *checkpoint);
 void timer_function_retention (void *data);
@@ -347,6 +352,19 @@
 	return -1;
 }
 
+static int processor_add (struct in_addr *proc_addr, int count, struct ckpt_refcnt \
*ckpt_refcount)  +{
+	int i;
+	for (i = 0; i < PROCESSOR_COUNT_MAX; i ++) {
+		if (ckpt_refcount[i].addr.s_addr == 0) {
+			memcpy(&ckpt_refcount[i].addr, proc_addr, sizeof(struct in_addr));
+			ckpt_refcount[i].count = count;
+			return i;
+		}
+	}
+	return -1;
+}
+
 static int processor_index_find(struct in_addr *proc_addr,
 								struct ckpt_refcnt *ckpt_refcount) 
 { 
@@ -381,6 +399,46 @@
 	memset((char*)ckpt_refcount, 0, PROCESSOR_COUNT_MAX * sizeof(struct ckpt_refcnt));
 }
 
+static void merge_ckpt_refcounts(struct ckpt_refcnt *local, struct ckpt_refcnt \
*network) +{
+	int index,i;	
+	struct in_addr zero_ip;
+	zero_ip.s_addr = inet_addr("0.0.0.0");
+
+	for (i = 0; i < PROCESSOR_COUNT_MAX; i ++) {
+		if (local[i].addr.s_addr == zero_ip.s_addr) {
+			continue;
+		}
+		index  = processor_index_find (&local[i].addr, network);
+		if (index == -1) { /*Could Not Find the Local Entry in the remote.Add to it*/
+			log_printf (LOG_LEVEL_DEBUG,"calling processor_add for ip %s, count %d\n",
+				inet_ntoa(local[i].addr),
+				local[i].count);
+			index = processor_add (&local[i].addr, local[i].count, network);
+			if (index == -1) {
+				log_printf(LOG_LEVEL_ERROR,
+					"merge_ckpt_refcounts : could not add a new processor as the MAX limit of procs \
is reached.Exiting\n"); +				assert(0);
+			}
+		}
+		else {
+			if (local[i].count == network[index].count) {
+				/*Nothing to do here as the network is already up 2 date*/
+				log_printf (LOG_LEVEL_DEBUG,"merge_ckpt_refcounts counts match, continue\n");
+				continue;
+			}
+			else {
+				/*Found a match for this proc in the Network choose the larger of the 2.*/
+				network[index].count += local[i].count; 
+				log_printf (LOG_LEVEL_DEBUG,"setting count for %s = %d\n",
+					inet_ntoa(network[index].addr),
+					network[index].count);
+			}
+		}
+	}
+}
+
+
 static void ckpt_recovery_initialize (void) 
 {
 	struct list_head *checkpoint_list;
@@ -621,13 +679,14 @@
 				 * Init the section ptr to 0 so it is re evaled
 				 */
 				recovery_ckpt_next = recovery_ckpt_next->next;							
-				recovery_ckpt_section_next = 0;				
+				recovery_ckpt_section_next = 0;
 				continue;
 			}
 		}
+		
 		/*Should only be here at the end of the traversal of the ckpt list*/
 		ckpt_recovery_finalize();
-recovery_exit_clean:		
+recovery_exit_clean:				
 		/*Re - Initialize the static's*/
 		recovery_ckpt_next = 0;
 		recovery_ckpt_section_next = 0;
@@ -689,6 +748,7 @@
 	
 }
 
+
 static void ckpt_recovery_activate (void) 
 {		
  	recovery_state = SYNCHRONY_STATE_ENDED;
@@ -701,23 +761,62 @@
 	return;
 }
 
-static void ckpt_recovery_process_members_exit(struct in_addr *left_list, int \
left_list_entries)  +static void ckpt_replace_localhost_ip (struct in_addr \
*joined_list) { +	struct list_head *checkpoint_list;
+	struct saCkptCheckpoint *checkpoint;
+	struct in_addr local_ip;
+	int index;
+
+	assert(joined_list);
+
+	local_ip.s_addr = inet_addr("127.0.0.1");
+
+	for (checkpoint_list = checkpoint_list_head.next;
+		checkpoint_list != &checkpoint_list_head;
+		checkpoint_list = checkpoint_list->next) {
+
+		checkpoint = list_entry (checkpoint_list,
+			struct saCkptCheckpoint, list);
+		index = processor_index_find(&local_ip, checkpoint->ckpt_refcount);
+		if (index == -1) {
+			continue;
+		}		
+		memcpy(&checkpoint->ckpt_refcount[index].addr, joined_list, sizeof(struct \
in_addr)); +		log_printf (LOG_LEVEL_DEBUG, "Transitioning From Local Host replacing \
127.0.0.1 with %s ...\n", +			inet_ntoa(*joined_list));
+
+	}
+	process_localhost_transition = 0;
+}
+
+
+static void ckpt_recovery_process_members_exit(struct in_addr *left_list, 
+						int left_list_entries)
 {
 	struct list_head *checkpoint_list;
 	struct saCkptCheckpoint *checkpoint;
 	struct in_addr *member;
+	struct in_addr local_ip;
 	int index;
 	int i;
+
+	local_ip.s_addr = inet_addr("127.0.0.1");
 	
 	if (left_list_entries == 0) {
 		return;
 	}
+
+	if ((left_list_entries == 1) && 
+		(left_list->s_addr == local_ip.s_addr)) {
+		process_localhost_transition = 1;
+		return; 
+	}
 	
 	/*
 	 *  Iterate left_list_entries. 
 	 */
 	member = left_list;
-	for (i = 0; i < left_list_entries; i++) {
+	for (i = 0; i < left_list_entries; i++) {		
 		for (checkpoint_list = checkpoint_list_head.next;
 			checkpoint_list != &checkpoint_list_head;
 			checkpoint_list = checkpoint_list->next) {
@@ -727,7 +826,7 @@
 			index = processor_index_find(member, checkpoint->ckpt_refcount);			
 			if (index == -1) {
 				continue;
-			}		
+			}
 			/*
 			 * Decrement
 			 * 
@@ -753,9 +852,11 @@
 												&checkpoint->name.value);		
 				checkpoint_release (checkpoint);
 			} else
-			if (checkpoint->referenceCount == 0) {
+			if ((checkpoint->expired == 0) && (checkpoint->referenceCount == 0)) {
 				log_printf (LOG_LEVEL_DEBUG, "ckpt_recovery_process_members_exit: Starting timer \
to release checkpoint %s.\n",  &checkpoint->name.value);
+				poll_timer_delete (aisexec_poll_handle, checkpoint->retention_timer);
+
 				poll_timer_add (aisexec_poll_handle,
 					checkpoint->checkpointCreationAttributes.retentionDuration / 1000000,
 					checkpoint,
@@ -783,6 +884,17 @@
 		if (recovery_state == SYNCHRONY_STATE_ENDED) {
 			memcpy (&saved_ring_id, ring_id, sizeof(struct memb_ring_id));
 		}
+		if (process_localhost_transition && (joined_list_entries == 1)) {
+			if (joined_list_entries == 1) {
+				ckpt_replace_localhost_ip (joined_list);
+			}
+			else {
+				/*We should never be here*/
+				log_printf (LOG_LEVEL_ERROR,
+					"We were told to process transitioning from local host, but there are multiple \
entries in the join list\n"); +				assert(0);
+			}
+		}
 	}	
 
 	else if (configuration_type == TOTEM_CONFIGURATION_TRANSITIONAL) {
@@ -977,6 +1089,8 @@
 	if (conn_info->conn_info_partner->service != CKPT_SERVICE) {
 		return 0;
 	}
+
+	log_printf(LOG_LEVEL_DEBUG, "ckpt_exit_fn conn_info = %#x, with fd = %d\n", \
conn_info, conn_info->fd);  
 	/*
 	 * close all checkpoints opened on this fd
@@ -996,7 +1110,7 @@
                 
 		cleanup_list = conn_info->conn_info_partner->ais_ci.u.libckpt_ci.checkpoint_list.next;
  }
-
+	
 	if (conn_info->conn_info_partner->ais_ci.u.libckpt_ci.sectionIterator.sectionIteratorEntries) \
{  free (conn_info->ais_ci.u.libckpt_ci.sectionIterator.sectionIteratorEntries);
 	}
@@ -1226,8 +1340,16 @@
 		error = SA_AIS_ERR_BAD_CHECKPOINT; /* Is this the correct return ? */
 		goto error_exit;
 	}
-	
-	initialize_ckpt_refcount_array(ckptCheckpoint->ckpt_refcount);
+
+	/*CHECK to see if there are any existing ckpts*/
+	if ((ckptCheckpoint->ckpt_refcount) &&  \
(ckpt_refcount_total(ckptCheckpoint->ckpt_refcount) > 0)) { +		log_printf \
(LOG_LEVEL_DEBUG,"calling merge_ckpt_refcounts\n"); \
+		merge_ckpt_refcounts(ckptCheckpoint->ckpt_refcount, ref_cnt); +	}
+	/*No Existing ckpts. Lets assign what we got over the network*/
+	else  {
+		initialize_ckpt_refcount_array(ckptCheckpoint->ckpt_refcount);
+	}
 	ckptCheckpoint->referenceCount = ckpt_refcount_total(ref_cnt);
 	log_printf (LOG_LEVEL_DEBUG, "CKPT: OPEN ckptCheckpoint->referenceCount \
%d\n",ckptCheckpoint->referenceCount);  \
memcpy(ckptCheckpoint->ckpt_refcount,ref_cnt,sizeof(struct \
ckpt_refcnt)*PROCESSOR_COUNT_MAX); @@ -1457,6 +1579,10 @@
 		 */
 		checkpoint_release (ckptCheckpoint);
 	}
+	else if ( ckptCheckpoint->referenceCount > 0 ) {
+		ckptCheckpoint->unlinked = 0;
+		ckptCheckpoint->expired = 0;
+	}
 
 error_exit:
 	/*
@@ -1518,7 +1644,7 @@
 	struct iovec iovecs[2];
 
 	checkpoint = ckpt_checkpoint_find_global \
                (&req_exec_ckpt_checkpointretentiondurationexpire->checkpointName);
-	if (checkpoint && checkpoint->expired == 0) {
+	if (checkpoint && (checkpoint->expired == 0) && (checkpoint->referenceCount < 1)) {
 		log_printf (LOG_LEVEL_NOTICE, "CKPT: Expiring checkpoint %s\n", getSaNameT \
(&req_exec_ckpt_checkpointretentiondurationexpire->checkpointName));  \
checkpoint->expired = 1;  
@@ -1971,6 +2097,7 @@
 	if (sizeRequired > ckptCheckpointSection->sectionDescriptor.sectionSize) {
 		sectionData = realloc (ckptCheckpointSection->sectionData, sizeRequired);
 		if (sectionData == 0) {
+			log_printf (LOG_LEVEL_ERROR, "CKPT: sectionData realloc returned 0 Calling \
error_exit.\n");  error = SA_AIS_ERR_NO_MEMORY;
 			goto error_exit;
 		}
@@ -2008,7 +2135,6 @@
 			&res_lib_ckpt_sectionwrite,
 			sizeof (struct res_lib_ckpt_sectionwrite));
 	}
-
 	return (0);
 }
 
@@ -2239,26 +2365,36 @@
 	struct req_exec_ckpt_checkpointclose req_exec_ckpt_checkpointclose;
 	struct saCkptCheckpoint *checkpoint;
 	struct iovec iovecs[2];
+	struct res_lib_ckpt_checkpointclose res_lib_ckpt_checkpointclose;
 
 	checkpoint = ckpt_checkpoint_find_global \
                (&req_lib_ckpt_checkpointclose->checkpointName);
-	if (checkpoint->expired == 1) {
-		return (0);
-	}	
-	
-	req_exec_ckpt_checkpointclose.header.size =
-		sizeof (struct req_exec_ckpt_checkpointclose);
-	req_exec_ckpt_checkpointclose.header.id = MESSAGE_REQ_EXEC_CKPT_CHECKPOINTCLOSE;
+	if (checkpoint && (checkpoint->expired == 0)){
+		req_exec_ckpt_checkpointclose.header.size =
+			sizeof (struct req_exec_ckpt_checkpointclose);
+		req_exec_ckpt_checkpointclose.header.id = MESSAGE_REQ_EXEC_CKPT_CHECKPOINTCLOSE;
 
-	message_source_set (&req_exec_ckpt_checkpointclose.source, conn_info);
+		message_source_set (&req_exec_ckpt_checkpointclose.source, conn_info);
 
-	memcpy (&req_exec_ckpt_checkpointclose.checkpointName,
-		&checkpoint->name, sizeof (SaNameT));
+		memcpy (&req_exec_ckpt_checkpointclose.checkpointName,
+			&checkpoint->name, sizeof (SaNameT));
 
-	iovecs[0].iov_base = (char *)&req_exec_ckpt_checkpointclose;
-	iovecs[0].iov_len = sizeof (req_exec_ckpt_checkpointclose);
+		iovecs[0].iov_base = (char *)&req_exec_ckpt_checkpointclose;
+		iovecs[0].iov_len = sizeof (req_exec_ckpt_checkpointclose);
 
-	if (totempg_send_ok (sizeof (struct req_exec_ckpt_checkpointclose))) {
-		assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		if (totempg_send_ok (sizeof (struct req_exec_ckpt_checkpointclose))) {
+			assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		}
+	}
+	else {
+		log_printf (LOG_LEVEL_ERROR, "Could Not Find the Checkpoint to close so Returning \
Error.\n"); +
+		res_lib_ckpt_checkpointclose.header.size = sizeof (struct \
res_lib_ckpt_checkpointclose); +		res_lib_ckpt_checkpointclose.header.id = \
MESSAGE_RES_CKPT_CHECKPOINT_CHECKPOINTCLOSE; \
+		res_lib_ckpt_checkpointclose.header.error = SA_AIS_ERR_NOT_EXIST; +
+		libais_send_response (conn_info,
+			&res_lib_ckpt_checkpointclose,
+			sizeof (struct res_lib_ckpt_checkpointclose));
 	}
 
 	return (0);
@@ -2335,33 +2471,47 @@
 	 */
 	checkpoint = ckpt_checkpoint_find_global \
(&req_lib_ckpt_checkpointstatusget->checkpointName);  
-	for (checkpoint_section_list = checkpoint->checkpointSectionsListHead.next;
-		checkpoint_section_list != &checkpoint->checkpointSectionsListHead;
-		checkpoint_section_list = checkpoint_section_list->next) {
+	if (checkpoint && (checkpoint->expired == 0)) {
 
-		checkpointSection = list_entry (checkpoint_section_list,
-			struct saCkptCheckpointSection, list);
+		for (checkpoint_section_list = checkpoint->checkpointSectionsListHead.next;
+			checkpoint_section_list != &checkpoint->checkpointSectionsListHead;
+			checkpoint_section_list = checkpoint_section_list->next) {
 
-		memoryUsed += checkpointSection->sectionDescriptor.sectionSize;
-		numberOfSections += 1;
-	}
+			checkpointSection = list_entry (checkpoint_section_list,
+				struct saCkptCheckpointSection, list);
 
-	/*
-	 * Build checkpoint status get response
-	 */
-	res_lib_ckpt_checkpointstatusget.header.size = sizeof (struct \
                res_lib_ckpt_checkpointstatusget);
-	res_lib_ckpt_checkpointstatusget.header.id = \
                MESSAGE_RES_CKPT_CHECKPOINT_CHECKPOINTSTATUSGET;
-	res_lib_ckpt_checkpointstatusget.header.error = SA_AIS_OK;
+			memoryUsed += checkpointSection->sectionDescriptor.sectionSize;
+			numberOfSections += 1;
+		}
 
-	memcpy (&res_lib_ckpt_checkpointstatusget.checkpointDescriptor.checkpointCreationAttributes,
                
-		&checkpoint->checkpointCreationAttributes,
-		sizeof (SaCkptCheckpointCreationAttributesT));
-	res_lib_ckpt_checkpointstatusget.checkpointDescriptor.numberOfSections = \
                numberOfSections;
-	res_lib_ckpt_checkpointstatusget.checkpointDescriptor.memoryUsed = memoryUsed;
+		/*
+		 * Build checkpoint status get response
+		 */
+		res_lib_ckpt_checkpointstatusget.header.size = sizeof (struct \
res_lib_ckpt_checkpointstatusget); +		res_lib_ckpt_checkpointstatusget.header.id = \
MESSAGE_RES_CKPT_CHECKPOINT_CHECKPOINTSTATUSGET; \
+		res_lib_ckpt_checkpointstatusget.header.error = SA_AIS_OK; +
+		memcpy (&res_lib_ckpt_checkpointstatusget.checkpointDescriptor.checkpointCreationAttributes,
 +			&checkpoint->checkpointCreationAttributes,
+			sizeof (SaCkptCheckpointCreationAttributesT));
+		res_lib_ckpt_checkpointstatusget.checkpointDescriptor.numberOfSections = \
numberOfSections; +		res_lib_ckpt_checkpointstatusget.checkpointDescriptor.memoryUsed \
= memoryUsed;  
-	log_printf (LOG_LEVEL_DEBUG, "before sending message\n");
-	libais_send_response (conn_info, &res_lib_ckpt_checkpointstatusget,
-		sizeof (struct res_lib_ckpt_checkpointstatusget));
+		log_printf (LOG_LEVEL_DEBUG, "before sending message\n");
+		libais_send_response (conn_info, &res_lib_ckpt_checkpointstatusget,
+			sizeof (struct res_lib_ckpt_checkpointstatusget));
+	}
+	else {
+		log_printf (LOG_LEVEL_ERROR, "Could Not Find the Checkpoint's status so Returning \
Error.\n"); +
+		res_lib_ckpt_checkpointstatusget.header.size = sizeof (struct \
res_lib_ckpt_checkpointstatusget); +		res_lib_ckpt_checkpointstatusget.header.id = \
MESSAGE_RES_CKPT_CHECKPOINT_CHECKPOINTSTATUSGET; \
+		res_lib_ckpt_checkpointstatusget.header.error = SA_AIS_ERR_NOT_EXIST; +
+		libais_send_response (conn_info,
+			&res_lib_ckpt_checkpointstatusget,
+			sizeof (struct res_lib_ckpt_checkpointstatusget));
+        }
 	return (0);
 }
 
@@ -2371,35 +2521,43 @@
 	struct req_exec_ckpt_sectioncreate req_exec_ckpt_sectioncreate;
 	struct iovec iovecs[2];
 	struct saCkptCheckpoint *checkpoint;
+	struct res_lib_ckpt_sectioncreate res_lib_ckpt_sectioncreate;
 
 	log_printf (LOG_LEVEL_DEBUG, "Section create from API fd %d\n", conn_info->fd);
 	checkpoint = ckpt_checkpoint_find_global \
(&req_lib_ckpt_sectioncreate->checkpointName);  
-	/*
-	 * checkpoint opened is writeable mode so send message to cluster
-	 */
-	req_exec_ckpt_sectioncreate.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONCREATE;
-	req_exec_ckpt_sectioncreate.header.size = sizeof (struct \
                req_exec_ckpt_sectioncreate);
-
-	memcpy (&req_exec_ckpt_sectioncreate.req_lib_ckpt_sectioncreate,
-		req_lib_ckpt_sectioncreate,
-		sizeof (struct req_lib_ckpt_sectioncreate));
-
-	memcpy (&req_exec_ckpt_sectioncreate.checkpointName,
-		&req_lib_ckpt_sectioncreate->checkpointName,
-		sizeof (SaNameT));
+	if (checkpoint && (checkpoint->expired == 0)) {
+		/*
+		 * checkpoint opened is writeable mode so send message to cluster
+		 */
+		req_exec_ckpt_sectioncreate.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONCREATE;
+		req_exec_ckpt_sectioncreate.header.size = sizeof (struct \
req_exec_ckpt_sectioncreate);  
-	message_source_set (&req_exec_ckpt_sectioncreate.source, conn_info);
+		memcpy (&req_exec_ckpt_sectioncreate.req_lib_ckpt_sectioncreate,
+			req_lib_ckpt_sectioncreate,
+			sizeof (struct req_lib_ckpt_sectioncreate));
+	
+		memcpy (&req_exec_ckpt_sectioncreate.checkpointName,
+			&req_lib_ckpt_sectioncreate->checkpointName,
+			sizeof (SaNameT));
 
-	iovecs[0].iov_base = (char *)&req_exec_ckpt_sectioncreate;
-	iovecs[0].iov_len = sizeof (req_exec_ckpt_sectioncreate);
-	/*
-	 * Send section name and initial data in message
-	 */
-	iovecs[1].iov_base = ((char *)req_lib_ckpt_sectioncreate) + sizeof (struct \
                req_lib_ckpt_sectioncreate);
-	iovecs[1].iov_len = req_lib_ckpt_sectioncreate->header.size - sizeof (struct \
                req_lib_ckpt_sectioncreate);
-	req_exec_ckpt_sectioncreate.header.size += iovecs[1].iov_len;
+		message_source_set (&req_exec_ckpt_sectioncreate.source, conn_info);
 
+		iovecs[0].iov_base = (char *)&req_exec_ckpt_sectioncreate;
+		iovecs[0].iov_len = sizeof (req_exec_ckpt_sectioncreate);
+		/*
+		 * Send section name and initial data in message
+		 */
+		iovecs[1].iov_base = ((char *)req_lib_ckpt_sectioncreate) + sizeof (struct \
req_lib_ckpt_sectioncreate); +		iovecs[1].iov_len = \
req_lib_ckpt_sectioncreate->header.size - sizeof (struct req_lib_ckpt_sectioncreate); \
+		req_exec_ckpt_sectioncreate.header.size += iovecs[1].iov_len; +	
+		if (iovecs[1].iov_len) {
+			log_printf (LOG_LEVEL_DEBUG, "CKPT: message_handler_req_lib_ckpt_sectioncreate \
Section = %s, idLen = %d\n", +				iovecs[1].iov_base,
+				iovecs[1].iov_len); 
+		}
+	
 #ifdef DEBUG
 printf ("LIBRARY SECTIONCREATE string is %s len is %d\n", (unsigned char \
*)iovecs[1].iov_base,  iovecs[1].iov_len);
@@ -2413,11 +2571,24 @@
 }
 printf ("|\n");
 #endif
-	if (iovecs[1].iov_len > 0) {
-		log_printf (LOG_LEVEL_DEBUG, "IOV_BASE is %p\n", iovecs[1].iov_base);
-		assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
-	} else {
-		assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		if (iovecs[1].iov_len > 0) {
+			log_printf (LOG_LEVEL_DEBUG, "IOV_BASE is %p\n", iovecs[1].iov_base);
+			assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
+		} else {
+			assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		}
+
+	}
+	else {
+		log_printf (LOG_LEVEL_ERROR, "Could Not Find the Checkpoint to create a section in \
so Returning Error.\n"); +
+		res_lib_ckpt_sectioncreate.header.size = sizeof (struct \
res_lib_ckpt_sectioncreate); +		res_lib_ckpt_sectioncreate.header.id = \
MESSAGE_RES_CKPT_CHECKPOINT_SECTIONCREATE; +		res_lib_ckpt_sectioncreate.header.error \
= SA_AIS_ERR_NOT_EXIST; +
+		libais_send_response (conn_info,
+			&res_lib_ckpt_sectioncreate,
+			sizeof (struct res_lib_ckpt_sectioncreate));
 	}
 
 	return (0);
@@ -2510,39 +2681,61 @@
 	struct req_exec_ckpt_sectionwrite req_exec_ckpt_sectionwrite;
 	struct iovec iovecs[2];
 	struct saCkptCheckpoint *checkpoint;
+	struct res_lib_ckpt_sectionwrite res_lib_ckpt_sectionwrite;
+	
+	log_printf (LOG_LEVEL_DEBUG, "CKPT: Received data from lib with len = %d and ref = \
0x%x\n", +			req_lib_ckpt_sectionwrite->dataSize,
+		 	req_lib_ckpt_sectionwrite->dataOffset);
+
+	log_printf (LOG_LEVEL_DEBUG, "CKPT: Checkpoint section being written to is %s, \
idLen = %d\n", +			((char *)req_lib_ckpt_sectionwrite) + sizeof (struct \
req_lib_ckpt_sectionwrite), +			req_lib_ckpt_sectionwrite->idLen);
 
 	log_printf (LOG_LEVEL_DEBUG, "Section write from API fd %d\n", conn_info->fd);
 	checkpoint = ckpt_checkpoint_find_global \
(&req_lib_ckpt_sectionwrite->checkpointName);  
-	/*
-	 * checkpoint opened is writeable mode so send message to cluster
-	 */
-	req_exec_ckpt_sectionwrite.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONWRITE;
-	req_exec_ckpt_sectionwrite.header.size = sizeof (struct \
                req_exec_ckpt_sectionwrite); 
-
-	memcpy (&req_exec_ckpt_sectionwrite.req_lib_ckpt_sectionwrite,
-		req_lib_ckpt_sectionwrite,
-		sizeof (struct req_lib_ckpt_sectionwrite));
-
-	memcpy (&req_exec_ckpt_sectionwrite.checkpointName,
-		&req_lib_ckpt_sectionwrite->checkpointName,
-		sizeof (SaNameT));
+	if (checkpoint && (checkpoint->expired == 0)) {
+		/*
+		 * checkpoint opened is writeable mode so send message to cluster
+		 */
+		req_exec_ckpt_sectionwrite.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONWRITE;
+		req_exec_ckpt_sectionwrite.header.size = sizeof (struct \
req_exec_ckpt_sectionwrite);   
-	message_source_set (&req_exec_ckpt_sectionwrite.source, conn_info);
+		memcpy (&req_exec_ckpt_sectionwrite.req_lib_ckpt_sectionwrite,
+			req_lib_ckpt_sectionwrite,
+			sizeof (struct req_lib_ckpt_sectionwrite));
+	
+		memcpy (&req_exec_ckpt_sectionwrite.checkpointName,
+			&req_lib_ckpt_sectionwrite->checkpointName,
+			sizeof (SaNameT));
+	
+		message_source_set (&req_exec_ckpt_sectionwrite.source, conn_info);
+	
+		iovecs[0].iov_base = (char *)&req_exec_ckpt_sectionwrite;
+		iovecs[0].iov_len = sizeof (req_exec_ckpt_sectionwrite);
+		/*
+		 * Send section name and data to write in message
+		 */
+		iovecs[1].iov_base = ((char *)req_lib_ckpt_sectionwrite) + sizeof (struct \
req_lib_ckpt_sectionwrite); +		iovecs[1].iov_len = \
req_lib_ckpt_sectionwrite->header.size - sizeof (struct req_lib_ckpt_sectionwrite); \
+		req_exec_ckpt_sectionwrite.header.size += iovecs[1].iov_len; +	
+		if (iovecs[1].iov_len > 0) {
+			assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
+		} else {
+			assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		}
+	}	
+	else {
+		log_printf (LOG_LEVEL_ERROR, "Could Not Find the Checkpoint to write to Returning \
Error.\n");  
-	iovecs[0].iov_base = (char *)&req_exec_ckpt_sectionwrite;
-	iovecs[0].iov_len = sizeof (req_exec_ckpt_sectionwrite);
-	/*
-	 * Send section name and data to write in message
-	 */
-	iovecs[1].iov_base = ((char *)req_lib_ckpt_sectionwrite) + sizeof (struct \
                req_lib_ckpt_sectionwrite);
-	iovecs[1].iov_len = req_lib_ckpt_sectionwrite->header.size - sizeof (struct \
                req_lib_ckpt_sectionwrite);
-	req_exec_ckpt_sectionwrite.header.size += iovecs[1].iov_len;
+		res_lib_ckpt_sectionwrite.header.size = sizeof (struct res_lib_ckpt_sectionwrite);
+		res_lib_ckpt_sectionwrite.header.id = MESSAGE_RES_CKPT_CHECKPOINT_SECTIONWRITE;
+		res_lib_ckpt_sectionwrite.header.error = SA_AIS_ERR_NOT_EXIST;
 
-	if (iovecs[1].iov_len > 0) {
-		assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
-	} else {
-		assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		libais_send_response (conn_info,
+			&res_lib_ckpt_sectionwrite,
+			sizeof (struct res_lib_ckpt_sectionwrite));	
 	}
 
 	return (0);
@@ -2554,41 +2747,55 @@
 	struct req_exec_ckpt_sectionoverwrite req_exec_ckpt_sectionoverwrite;
 	struct iovec iovecs[2];
 	struct saCkptCheckpoint *checkpoint;
+	struct res_lib_ckpt_sectionoverwrite res_lib_ckpt_sectionoverwrite;
 
 	log_printf (LOG_LEVEL_DEBUG, "Section overwrite from API fd %d\n", conn_info->fd);
 	checkpoint = ckpt_checkpoint_find_global \
(&req_lib_ckpt_sectionoverwrite->checkpointName);  
-	/*
-	 * checkpoint opened is writeable mode so send message to cluster
-	 */
-	req_exec_ckpt_sectionoverwrite.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONOVERWRITE;
-	req_exec_ckpt_sectionoverwrite.header.size = sizeof (struct \
                req_exec_ckpt_sectionoverwrite); 
-
-	memcpy (&req_exec_ckpt_sectionoverwrite.req_lib_ckpt_sectionoverwrite,
-		req_lib_ckpt_sectionoverwrite,
-		sizeof (struct req_lib_ckpt_sectionoverwrite));
-
-	memcpy (&req_exec_ckpt_sectionoverwrite.checkpointName,
-		&req_lib_ckpt_sectionoverwrite->checkpointName,
-		sizeof (SaNameT));
+	if (checkpoint && (checkpoint->expired == 0)) {
+		/*
+		 * checkpoint opened is writeable mode so send message to cluster
+		 */
+		req_exec_ckpt_sectionoverwrite.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONOVERWRITE;
+		req_exec_ckpt_sectionoverwrite.header.size = sizeof (struct \
req_exec_ckpt_sectionoverwrite);  +	
+		memcpy (&req_exec_ckpt_sectionoverwrite.req_lib_ckpt_sectionoverwrite,
+			req_lib_ckpt_sectionoverwrite,
+			sizeof (struct req_lib_ckpt_sectionoverwrite));
+	
+		memcpy (&req_exec_ckpt_sectionoverwrite.checkpointName,
+			&req_lib_ckpt_sectionoverwrite->checkpointName,
+			sizeof (SaNameT));
 
-	message_source_set (&req_exec_ckpt_sectionoverwrite.source, conn_info);
+		message_source_set (&req_exec_ckpt_sectionoverwrite.source, conn_info);
+	
+		iovecs[0].iov_base = (char *)&req_exec_ckpt_sectionoverwrite;
+		iovecs[0].iov_len = sizeof (req_exec_ckpt_sectionoverwrite);
+		/*
+		 * Send section name and data to overwrite in message
+		 */
+		iovecs[1].iov_base = ((char *)req_lib_ckpt_sectionoverwrite) + sizeof (struct \
req_lib_ckpt_sectionoverwrite); +		iovecs[1].iov_len = \
req_lib_ckpt_sectionoverwrite->header.size - sizeof (struct \
req_lib_ckpt_sectionoverwrite); +		req_exec_ckpt_sectionoverwrite.header.size += \
iovecs[1].iov_len; +	
+		if (iovecs[1].iov_len > 0) {
+			assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
+		} else {
+			assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		}
+	}
+	else {
+		log_printf (LOG_LEVEL_ERROR, "Could Not Find the Checkpoint to over write so \
Returning Error.\n");  
-	iovecs[0].iov_base = (char *)&req_exec_ckpt_sectionoverwrite;
-	iovecs[0].iov_len = sizeof (req_exec_ckpt_sectionoverwrite);
-	/*
-	 * Send section name and data to overwrite in message
-	 */
-	iovecs[1].iov_base = ((char *)req_lib_ckpt_sectionoverwrite) + sizeof (struct \
                req_lib_ckpt_sectionoverwrite);
-	iovecs[1].iov_len = req_lib_ckpt_sectionoverwrite->header.size - sizeof (struct \
                req_lib_ckpt_sectionoverwrite);
-	req_exec_ckpt_sectionoverwrite.header.size += iovecs[1].iov_len;
+		res_lib_ckpt_sectionoverwrite.header.size = sizeof (struct \
res_lib_ckpt_sectionwrite); +		res_lib_ckpt_sectionoverwrite.header.id = \
MESSAGE_RES_CKPT_CHECKPOINT_SECTIONOVERWRITE; \
+		res_lib_ckpt_sectionoverwrite.header.error = SA_AIS_ERR_NOT_EXIST;  
-	if (iovecs[1].iov_len > 0) {
-		assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
-	} else {
-		assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		libais_send_response (conn_info,
+			&res_lib_ckpt_sectionoverwrite,
+			sizeof (struct res_lib_ckpt_sectionoverwrite));
 	}
-
+	
 	return (0);
 }
 
@@ -2598,41 +2805,55 @@
 	struct req_exec_ckpt_sectionread req_exec_ckpt_sectionread;
 	struct iovec iovecs[2];
 	struct saCkptCheckpoint *checkpoint;
+	struct res_lib_ckpt_sectionread res_lib_ckpt_sectionread;
 
 	log_printf (LOG_LEVEL_DEBUG, "Section overwrite from API fd %d\n", conn_info->fd);
 	checkpoint = ckpt_checkpoint_find_global \
(&req_lib_ckpt_sectionread->checkpointName); +	
+	if (checkpoint && (checkpoint->expired == 0)) {
+		/*
+		 * checkpoint opened is writeable mode so send message to cluster
+		 */
+		req_exec_ckpt_sectionread.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONREAD;
+		req_exec_ckpt_sectionread.header.size = sizeof (struct req_exec_ckpt_sectionread);
 
-	/*
-	 * checkpoint opened is writeable mode so send message to cluster
-	 */
-	req_exec_ckpt_sectionread.header.id = MESSAGE_REQ_EXEC_CKPT_SECTIONREAD;
-	req_exec_ckpt_sectionread.header.size = sizeof (struct req_exec_ckpt_sectionread);
-
-	memcpy (&req_exec_ckpt_sectionread.req_lib_ckpt_sectionread,
-		req_lib_ckpt_sectionread,
-		sizeof (struct req_lib_ckpt_sectionread));
-
-	memcpy (&req_exec_ckpt_sectionread.checkpointName,
-		&req_lib_ckpt_sectionread->checkpointName,
-		sizeof (SaNameT));
-
-	message_source_set (&req_exec_ckpt_sectionread.source, conn_info);
-
-	iovecs[0].iov_base = (char *)&req_exec_ckpt_sectionread;
-	iovecs[0].iov_len = sizeof (req_exec_ckpt_sectionread);
-	/*
-	 * Send section name and data to overwrite in message
-	 */
-	iovecs[1].iov_base = ((char *)req_lib_ckpt_sectionread) + sizeof (struct \
                req_lib_ckpt_sectionread);
-	iovecs[1].iov_len = req_lib_ckpt_sectionread->header.size - sizeof (struct \
                req_lib_ckpt_sectionread);
-	req_exec_ckpt_sectionread.header.size += iovecs[1].iov_len;
-
-	if (iovecs[1].iov_len > 0) {
-		assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
-	} else {
-		assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		memcpy (&req_exec_ckpt_sectionread.req_lib_ckpt_sectionread,
+			req_lib_ckpt_sectionread,
+			sizeof (struct req_lib_ckpt_sectionread));
+	
+		memcpy (&req_exec_ckpt_sectionread.checkpointName,
+			&req_lib_ckpt_sectionread->checkpointName,
+			sizeof (SaNameT));
+	
+		message_source_set (&req_exec_ckpt_sectionread.source, conn_info);
+	
+		iovecs[0].iov_base = (char *)&req_exec_ckpt_sectionread;
+		iovecs[0].iov_len = sizeof (req_exec_ckpt_sectionread);
+		/*
+		 * Send section name and data to overwrite in message
+		 */
+		iovecs[1].iov_base = ((char *)req_lib_ckpt_sectionread) + sizeof (struct \
req_lib_ckpt_sectionread); +		iovecs[1].iov_len = \
req_lib_ckpt_sectionread->header.size - sizeof (struct req_lib_ckpt_sectionread); \
+		req_exec_ckpt_sectionread.header.size += iovecs[1].iov_len; +	
+		if (iovecs[1].iov_len > 0) {
+			assert (totempg_mcast (iovecs, 2, TOTEMPG_AGREED) == 0);
+		} else {
+			assert (totempg_mcast (iovecs, 1, TOTEMPG_AGREED) == 0);
+		}
 	}
+	else {
+		log_printf (LOG_LEVEL_ERROR, "Could Not Find the Checkpoint to read so Returning \
Error.\n"); +
+		res_lib_ckpt_sectionread.header.size = sizeof (struct res_lib_ckpt_sectionread);
+		res_lib_ckpt_sectionread.header.id = MESSAGE_RES_CKPT_CHECKPOINT_SECTIONREAD;
+		res_lib_ckpt_sectionread.header.error = SA_AIS_ERR_NOT_EXIST;
 
+		libais_send_response (conn_info,
+			&res_lib_ckpt_sectionread,
+			sizeof (struct res_lib_ckpt_sectionread));
+        }
+	
 	return (0);
 }
 
diff -uNr --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet --exclude=init \
--exclude=LICENSE --exclude=Makefile --exclude=man --exclude=README.devmap \
--exclude=SECURITY --exclude=TODO --exclude=CHANGELOG --exclude=conf --exclude=loc \
--exclude=Makefile.samples --exclude=QUICKSTART --exclude=.cdtproject \
--exclude=.project --exclude=nortel.patch latest/test/testckpt.c \
                openais/test/testckpt.c
--- latest/test/testckpt.c	2005-05-12 13:52:43 -05:00
+++ openais/test/testckpt.c	2005-05-12 13:50:31 -05:00
@@ -308,7 +308,7 @@
 		0,
 		&checkpointHandle2);
 	printf ("%s: Opening unlinked checkpoint\n", 
-		get_test_output (error, 7));
+		get_test_output (error, SA_AIS_OK));
 
 	error = saCkptCheckpointClose (checkpointHandle);
 	printf ("%s: Closing checkpoint\n", 
@@ -385,8 +385,8 @@
 									&sectionCreationAttributes2,
 									"Initial Data #2",
 									strlen ("Initial Data #2") + 1);
-	printf ("%s: creating section 2 \n",
-		get_test_output (error, SA_AIS_OK));
+	printf ("%s: re - creating section 2 \n",
+		get_test_output (error, SA_AIS_ERR_EXIST));
 
 	error = saCkptSectionExpirationTimeSet (checkpointHandle,
 		&sectionId2,



_______________________________________________
Openais mailing list
Openais@lists.osdl.org
http://lists.osdl.org/mailman/listinfo/openais


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic