[prev in list] [next in list] [prev in thread] [next in thread]
List: openais
Subject: [Openais] Checkpoint Recovery Synchronization
From: "Muni Bajpai" <muniba () nortel ! com>
Date: 2005-02-17 18:35:44
Message-ID: CFCE7C3BDB79204092974B5B50AD7194100290 () zrc2hxm0 ! corp ! nortel ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Hey Steven,
So onto phase II.
Do you have any preferences to the new (struct
req_exec_ckpt_checkpointsynchronize). I know you did mention having the
previous regular configuration ring_id in that message but what else ??
I know we have to send all the saCkptCheckpoint stored in the list that
checkpointListHead points to, or we could send out multiple synch for each
checkpoint. I prefer sending one message. But we
have to decide on the type of the aggregated data.
Also the standard
struct req_header header;
struct message_source source;
should be a part of the new struct too.
I cant think of anything else.
Please let me know,
Thanks
Muni
-----Original Message-----
From: openais-bounces@lists.osdl.org [mailto:openais-bounces@lists.osdl.org]
On Behalf Of Bajpai, Muni [NGC:B670:EXCH]
Sent: Wednesday, February 16, 2005 1:31 PM
To: 'sdake@mvista.com'
Cc: openais@lists.osdl.org; markh@osdl.org; Smith, Kristen [NGC:B675:EXCH]
Subject: RE: [Openais] Checkpoint crash in aisexec
Ok steve,
Thanks for the feedback. This is my take on the steps.
I.) First Patch
1.) Move struct memb_ring_id from totemsrp.c to totemsrp.h
2.) Move #define MAX_MEMBERS from totemsrp.c to totemsrp.h, change
the name of the definition to PROCESSOR_COUNT_MAX.
3.) Make changes to handlers.h, amf.c, ckpt.c, clm.c, evs.c,
totemsrp.c, totempg.c
II.) Second Patch
Add support for sync on the ckpt service.
Thanks
Muni
-----Original Message-----
From: Steven Dake [mailto:sdake@mvista.com <mailto:sdake@mvista.com> ]
Sent: Wednesday, February 16, 2005 1:02 PM
To: Bajpai, Muni [NGC:B670:EXCH]
Cc: openais@lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]; markh@osdl.org
Subject: RE: [Openais] Checkpoint crash in aisexec
Muni
I responded inline. I'd suggest if you tackle this problem to try to break
it up into a few patches to work on seperately. Ie: the configuration
change changes required to get the ring id through the config change system,
and then as a seperate patch the syncronization code.
Thanks
-steve
On Wed, 2005-02-16 at 09:38, Muni Bajpai wrote:
> Thanks for the quick responses last evening. My Response/Queries are
> inline prepended by a -------------------
>
> Muni
>
> -----Original Message-----
> From: Steven Dake [mailto:sdake@mvista.com <mailto:sdake@mvista.com> ]
> Sent: Tuesday, February 15, 2005 6:20 PM
> To: Bajpai, Muni [NGC:B670:EXCH]; openais@lists.osdl.org
> Cc: Smith, Kristen [NGC:B675:EXCH]; markh@osdl.org
> Subject: RE: [Openais] Checkpoint crash in aisexec
>
>
> Muni
> I hope you dont mind me copying the openais mailing list so others can
> share in our exchanges.
>
> Thanks for taking a look at this
>
> Responses inline
>
> On Tue, 2005-02-15 at 14:54, Muni Bajpai wrote:
> > Hey Steve,
> >
> > I work with kristen and need some more info on the checkpoint
> recovery
> > ...
> >
> > 1.) So the logic for accepting a configuration change from a
> processor
> > is :
> > if ((incoming_ring_id == last_known_ring_id)
> > && (source_processor != delivering_processor) {
> >
> > //IGNORE Change.
> > }
> >
> > So as per my understanding:
> > 1.) (Ckpt Executive Perspective) If the change is from ME
> then
> > always change
>
> maybe I was wrong with what I said before. Try this logic out:
>
> If the sync message is from your previous configuration, then the
> reference counts should not be updated because they would double the
> reference counts incorrectly.
>
> ------------- So you mean don't care about the source/dest of the sync
> message for decision making of accepting/ignoring config_chg, just use
> the ring_id ?
>
Its not the decision to accept the config change callback, its the decision
to accept the syncronization message. You should always accept the
configuration change callback. But in some cases, the sync message should
be ignored.
A member of the synchronization message should be "previous_ring_id" which
is the ring identifier of the ring previous to the one that is currently
undergoing recovery. Keep in mind that it should be the last regular
configuration, not the transitional configuration.
The previous ring id is sufficient to determine if the refcount increase
request would result in an invalid increase.
If they match, then the processor is already aware of the synchronization
contents and should ignore the request. If they dont match, then the
processor is unaware of the syncronization contents and should accept the
request.
> ?
>
> ------------- Is it possible to get sync's from 2 different processors
> with the same ring_id ??
>
No this is not possible.
The reason is that when determining to send the sync message, the old ring
id's representative is checked against the local ip. If they match, then
the sync message is sent (because this processor is the representative). If
they don't match, no sync message is sent (because the representative will
take care of requesting the synchronization message).
> The sync message is originated from the representative processor
> containing the ring id prior to the transitional configuration change.
>
> When the message is delivered, it is compared to the ring id prior to
> the transitional configuration. If these two match, then the message
> should be ignored because its a sync message from a processor within
> the prior configuration.
>
> > 2.) if the ring_id's don't match then always change.
> >
>
> Yes if the ring id in the delivered sync message doesn't match the
> previous ring id, then add the reference count information for that
> processor to the checkpoint.
>
> > Please confirm.
> >
> > 2.) We must add support for the new data structure additions in the
> > Ckpt Executive Opens and Close handlers also.
> >
>
> no data structures are required in the handler prototypes. I think we
> need a new message vs open and close. The message should be something
> like "synchronizecounts". I dont want to overload open and close too
> much with extra meaning. We could use this synchronizecounts for some
> other purpose later, like exchanging metadata too.
>
> ------------ So the ckpt_refcount[MAX_MEMBERS] array is modified on
> the receipt of sync messages,open and close??
>
Yes ckpt_refcount is modified on open, close, and in some cases on sync
given the logic above.
> > 3.) The addition as you enumerated to the checkpoint data structure,
> > did you have any implementation preferences or did you want us to
> use
> > anything appropriates (cursively I was thinking of a list of struct
> > refs)
>
> hmm I have an affinity towards avoiding any sort of memory allocation
> if at all possible (because they can fail, and this can cause us major
> troubles). Maybe something like struct ckpt_refcnt {
>
> int count;
> struct in_addr addr;
> };
>
> Then somethign like adding to saCkptCheckpoint
>
> struct ckpt_refcount ckpt_refcount[MAX_MEMBERS];
>
> MAX_MEMBERS should probably be brought out fromt otemsrp.c into
> totemsrp.h and changed from MAX_MEMBERS to PROCESSOR_COUNT_MAX.
>
> >
> > 4.) The last_known_ring_id. What does that mean to a newly added
> > processor. Explicitly ( incoming_ring_id == last_known_ring_id )
> will
> > always fail on a newly commissioned processor. Am I understanding
> that
> > correctly ?
> >
>
> no not incoming ring id. Instead it is the processor's last ring id
> in the originated synchronization message.
>
> last known ring id should be inited to zero. You understand that the
> sync message will have some value and last_known_ring_id will be zero.
>
> This will force the synchronization message to be accepted which is
> desired behavior.
>
> > Where is the last_known_ring_id stored ?
> >
>
> it must be stored when a configuration change is delivered to the
> ckpt_confchg_fn.
>
> > 5.) Is exec/evt.c the best example for any ideas on implementation
> ??
> >
>
> I don't think evt uses reference counting to track channels, but it is
> necessary for checkpoints because of checkpoint retention. I'd rather
> try to invent a few different approaches here so we can unify them
> later once we have discovered the best design.
>
> Synchronization after a merge or partition is the hardest part of a
> distributed system and I hope we can find a few approaches to test
> out.
>
> >
> > Thanks
> >
> > Muni
> >
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake@mvista.com <mailto:sdake@mvista.com> ]
> > Sent: Tuesday, February 15, 2005 1:51 PM
> > To: Smith, Kristen [NGC:B675:EXCH]
> > Cc: markh@osdl.org; openais@lists.osdl.org; Bajpai, Muni
> > [NGC:B670:EXCH]
> > Subject: RE: [Openais] Checkpoint crash in aisexec
> >
> >
> > On Tue, 2005-02-15 at 09:47, Kristen Smith wrote:
> > > Steve,
> > >
> > > Thanks for the response - I hear ya loud and clear - not good
> > without
> > > recovery. So, is there something that we could do to help you with
> > > this recovery coding? If you had some type of design thoughts on
> how
> > > you wanted checkpoint recovery to occur, maybe that is something
> we
> > > could help out with. Just throwing this out there to see what you
> > > think.
> > >
> >
> > Kristen
> > You have done alot to help us so far but more help is always
> > appreciated
> > :)
> >
> > If someone from your org wanted to get started writing code for
> > checkpoint recovery that would be great! I spent some time in the
> > drive to work this morning thinking about how checkpoint recovery
> > should work:
> >
> > There are 3 main steps that should be done in order:
> > 1. synchronize checkpoint reference counts (so retention timers work
> > properly)
> > 2. synchronize checkpoint metadata contents (sizes, sections, etc)
> 2.
> > synchronize checkpoint section data contents
> >
> > The place to get started is on the reference count synchronization.
> >
> > The checkpoint must contain a list of active user's processor ids
> > along with their reference count. So if processor A has checkpoint
> 1
> > open twice, and processor B has checkpoint 1 open three times, and
> > processor C has checkpoint 1 open four times each processor would
> > maintain a list for the checkpoint (in the checkpoint data
> structure):
> >
> > p_A:r_2
> > p_B:r_3
> > p_C:r_4
> >
> > Then on a configuration change, the leaving processors would close
> > their reference counts. So in this example, p_B leaves then the
> > processor ref count looks like: p_A:r_2 p_C:r_4
> >
> > During this configuration change, a processor joins p_D. It has
> > checkpoint 1 open 1 time. p_D gets a configuration change {add p_A,
> > p_C} and then sends a synchronization message with its previous ring
> > identifier and current list of checkpoint reference counts (after
> the
> > above leave in the configuration change was processed). The
> > representative of {p_A, p_C} also sends a synchronization message
> with
> > the previous ring identifier and a current list of checkpoint
> > reference counts. If the previous ring identifiers match and the
> > sending processor is not the delivering processor then p_C should
> > ignore p_A's message (ie: p_C receives p_A message, but it already
> > knows about p_A's references).
> >
> > This requires us to add the ring identifier to the configuration
> > change.
> >
> > So now each previous configuration is aware of the new
> configuration.
> > The reference counts look like:
> > p_A:r_2
> > p_C:r_4
> > p_D:r_1
> >
> > The above maintenence of the reference counts, or open checkpoints,
> > must maintain a per-checkpoint variable which is the "reference
> count
> > for this checkpoint". In the last case, that reference count would
> be
> > 7.
> >
> > Each time a processor leaves, its reference counts are subtracted
> from
> > this "global ref count". Each time a processor is added, its
> > reference counts are added. This reference count is then what is
> used
> > for retention duration.
> >
> > Any thoughts on the above approach welcome.
> >
> > Thanks!
> > -steve
> >
> > > Thanks,
> > > Kristen
> > >
> > > -----Original Message-----
> > > From: Steven Dake [mailto:sdake@mvista.com <mailto:sdake@mvista.com> ]
> > > Sent: Monday, February 14, 2005 2:17 PM
> > > To: Smith, Kristen [NGC:B675:EXCH]; markh@osdl.org;
> > > openais@lists.osdl.org
> > > Cc: Bajpai, Muni [NGC:B670:EXCH]
> > > Subject: RE: [Openais] Checkpoint crash in aisexec
> > >
> > >
> > > On Sat, 2005-02-12 at 08:08, Kristen Smith wrote:
> > > > Steve,
> > > >
> > > > Thanks for the response.
> > > >
> > > > For recovery - what are the ramifications if we don't have
> > recovery
> > > > working 100%? What I see now is that when a node leaves the
> > cluster
> > > > and then rejoins, it receives evt messages, but it can take
> > anywhere
> > > > from 15seconds to minutes for evt messages sent from that node
> to
> > > > reach the other applications. I handle this with some
> > >
> > > Mark have you seen this issue?
> > >
> > > > message retries which is ok in this startup case. However, are
> we
> > in
> > > > jeopardy in other cases that I am not considering? When running
> > > > traffic the past few days and seeing periodic reconfigs, I don't
> > > seem
> > > > to be losing messages when that occurs - I only see the lost
> > > messages
> > > > when I actually kill a node and start it back up to rejoin the
> > > > cluster.
> > > >
> > >
> > > What we have today is totally unacceptable because atleast for
> > > checkpointing, there is no recovery. And Mark is waiting on my
> base
> > > code for event recovery.
> > >
> > > Definition of 100% working means if there is a failure during
> > > recovery, we are guaranteed a consistent state. I think evt is
> > pretty
> > > close to this goal, although the checkpoint replication after
> merge
> > > has not been developed yet. I can think of alot of easy ways to
> do
> > > this, but handling a failure during the recovery phase makes it
> more
> > > difficult.
> > >
> > > Definition of almost 100% is that recovery works properly if there
> > are
> > > no faults during recovery (ie: the merge process), but if there is
> a
> > > fault during recovery (ie: reconfig) something could go awry.
> > >
> > > We want consistently replicated data (the 100% case). 100% is
> > > probably past your development window; the other case is within
> > reach.
> > >
> > > Regards
> > > -steve
> > >
> > > > Thanks
> > > > Kristen
> > > >
> > > > -----Original Message-----
> > > > From: Steven Dake [mailto:sdake@mvista.com <mailto:sdake@mvista.com>
]
> > > > Sent: Friday, February 11, 2005 5:30 PM
> > > > To: Smith, Kristen [NGC:B675:EXCH]
> > > > Subject: RE: [Openais] Checkpoint crash in aisexec
> > > >
> > > >
> > > > Ok well I doubt with 200 byte checkpoints there is a buffer
> > > overflow.
> > > > :)
> > > >
> > > > Recovery will come after 188 is wrapped up. I think your two
> > weeks
> > > > window looks good for alpha-level recovery (ie: works most of
> the
> > > > time). High quality production recovery will not hit your
> window
> > > for
> > > > development (ie: works 100% of the time no matter what happens).
> > > >
> > > > Thanks
> > > > -steve
> > > >
> > > > On Fri, 2005-02-11 at 15:56, Kristen Smith wrote:
> > > > > Steve,
> > > > >
> > > > > The size of the checkpoints are ~200 bytes.
> > > > >
> > > > > I agree, valgrind is an excellent tool. We will run it through
> > and
> > > > see
> > > > > if that shows anything.
> > > > >
> > > > > I have tried this scenario maybe 30 times today (for various
> > other
> > > > > testing) and it happened maybe 10 times. For a while I could
> > > > reproduce
> > > > > with a given test about 5 times and then it hasn't happened
> > again.
> > > > >
> > > > > Sounds like defect-188 fixing is going well. May I ask how the
> > > > > recovery work is going as well? (Don't mean to be pushy on
> that
> > > > front
> > > > > - we have 2 more weeks of coding for our application left and
> I
> > am
> > > > > really hoping that we are able to put the new recovery code in
> > > > during
> > > > > that time).
> > > > >
> > > > > Thanks a bunch,
> > > > > Kristen
> > > > >
> > > > > -----Original Message-----
> > > > > From: Steven Dake [mailto:sdake@mvista.com
<mailto:sdake@mvista.com> ]
> > > > > Sent: Friday, February 11, 2005 4:37 PM
> > > > > To: Smith, Kristen [NGC:B675:EXCH]
> > > > > Subject: Re: [Openais] Checkpoint crash in aisexec
> > > > >
> > > > >
> > > > > how large are the read or write requests?
> > > > > just a thought there could be some buffer overrun with larger
> > > > > requests.
> > > > >
> > > > > On Fri, 2005-02-11 at 14:55, Kristen Smith wrote:
> > > > > > Steve,
> > > > > >
> > > > > > We are periodically seeing aisexec crash with the following
> > > trace:
> > > > > >
> > > > > > (gdb) bt
> > > > > > #0 message_handler_req_lib_ckpt_checkpointclose
> > > > > > (conn_info=0x0, message=0xb73fc008) at ckpt.c:1552
> > > > > > #1 0x080494c2 in poll_handler_libais_deliver
> > (handle=0,
> > > > > fd=3,
> > > > > > revent=134633824, data=0x89c2ad8,
> > > > > > prio=0x89b2784) at main.c:578
> > > > > > #2 0x08056e62 in poll_run (handle=0) at
> aispoll.c:386
> > > > > >
> > > > > >
> > > > > > #3 0x080499ac in main (argc=1, argv=0xbfffcb64) at
> > main.c:1003
> > > > > >
> > > > > > We have looked through the code but can't seem to figure out
> > how
> > > > > > conn_info is getting set to 0. Do you have any idea under
> what
> > > > > > circumstances conn_info could be null when this function is
> > > > called?
> > > > > >
> > > > > > This is happening when we have multiple nodes up and we kill
> > one
> > > > of
> > > > > > the active nodes. The standby node (which was reading
> > > checkpoints)
> > > > > > must now become a writer, so it closes the checkpoint and
> this
> > > > > > happens. Unfortunately, I can't reproduce this consistently
> -
> > I
> > > > > > finally got a core dump today. I don't recall ever seeing
> this
> > > > with
> > > > > > the old code.
> > > > > >
> > > > > > Thanks,
> > > > > > Kristen
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > > > > > _______________________________________________
> > > > > > Openais mailing list
> > > > > > Openais@lists.osdl.org
> > > > > http://lists.osdl.org/mailman/listinfo/openais
<http://lists.osdl.org/mailman/listinfo/openais>
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
[Attachment #5 (text/html)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1491" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=086332418-17022005><FONT face=Arial size=2>Hey
Steven,</FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT face=Arial size=2>So onto phase II.
</FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT face=Arial
size=2><FONT color=#000000>Do you have any preferences to the new</FONT> <FONT
color=#000000>(</FONT></FONT><B><FONT color=#7f0055><FONT face=Arial
size=2>struct</FONT></B></FONT><FONT face=Arial color=#000000 size=2>
req_exec_ckpt_checkpoint<SPAN class=086332418-17022005>synchronize). I know you
did mention having the previous regular configuration ring_id in that message
but what else ??</SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial color=#000000 size=2>I know we have to
send all the </FONT><FONT face=Arial><FONT color=#000000><FONT
size=2>saCkptCheckpoin<SPAN class=086332418-17022005>t stored in the list that
<FONT size=2>checkpointListHead points to, or we could send out multiple synch
for each checkpoint. I prefer sending one message. But we
</FONT></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005>have to decide on the type of the aggregated
data.</SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005>Also the
standard</SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN><SPAN
class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005><B><FONT color=#7f0055 size=2>struct</B></FONT><FONT
size=2> req_header
header;</FONT></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005><B><FONT color=#7f0055 size=2>struct</B></FONT><FONT
size=2> message_source
source;</FONT></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005><FONT
size=2></FONT></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005><FONT size=2><FONT color=#0000ff><FONT
color=#000000>should be a part of the new struct
too</FONT>.</FONT></DIV></FONT></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005>I cant think of anything
else.</SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005>Please let me
know,</SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005>Thanks</SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT><SPAN
class=086332418-17022005><FONT face=Arial><FONT color=#000000><FONT size=2><SPAN
class=086332418-17022005>Muni</SPAN></FONT></FONT></FONT></SPAN></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT face=Arial
color=#000000 size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></SPAN> </DIV>
<DIV><SPAN class=086332418-17022005><FONT color=#0000ff><FONT face=Arial
color=#000000 size=2><SPAN
class=086332418-17022005></SPAN></FONT></FONT></SPAN> </DIV>
<BLOCKQUOTE DEFANGED_style="MARGIN-RIGHT: 0px">
<DIV></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left><FONT
face=Tahoma size=2>-----Original Message-----<BR><B>From:</B>
openais-bounces@lists.osdl.org [mailto:openais-bounces@lists.osdl.org] <B>On
Behalf Of </B>Bajpai, Muni [NGC:B670:EXCH]<BR><B>Sent:</B> Wednesday, February
16, 2005 1:31 PM<BR><B>To:</B> 'sdake@mvista.com'<BR><B>Cc:</B>
openais@lists.osdl.org; markh@osdl.org; Smith, Kristen
[NGC:B675:EXCH]<BR><B>Subject:</B> RE: [Openais] Checkpoint crash in
aisexec<BR><BR></FONT></DIV>
<P><FONT size=2>Ok steve,</FONT> </P>
<P><FONT size=2>Thanks for the feedback. This is my take on the steps.</FONT>
</P>
<P><FONT size=2>I.) First Patch</FONT>
<BR> <FONT size=2>1.) Move struct
memb_ring_id from totemsrp.c to totemsrp.h</FONT>
<BR> <FONT size=2>2.) Move #define
MAX_MEMBERS from totemsrp.c to totemsrp.h, change the name of the definition
to PROCESSOR_COUNT_MAX.</FONT></P>
<P> <FONT size=2>3.) Make changes to
handlers.h, amf.c, ckpt.c, clm.c, evs.c, totemsrp.c, totempg.c</FONT> </P>
<P><FONT size=2>II.) Second Patch</FONT>
<BR> <FONT size=2>Add support for
sync on the ckpt service.</FONT> </P>
<P><FONT size=2>Thanks</FONT> </P>
<P><FONT size=2>Muni</FONT> <BR><FONT size=2>-----Original Message-----</FONT>
<BR><FONT size=2>From: Steven Dake [<A
href="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>] </FONT><BR><FONT
size=2>Sent: Wednesday, February 16, 2005 1:02 PM</FONT> <BR><FONT size=2>To:
Bajpai, Muni [NGC:B670:EXCH]</FONT> <BR><FONT size=2>Cc:
openais@lists.osdl.org; Smith, Kristen [NGC:B675:EXCH]; markh@osdl.org</FONT>
<BR><FONT size=2>Subject: RE: [Openais] Checkpoint crash in aisexec</FONT>
</P><BR>
<P><FONT size=2>Muni</FONT> </P>
<P><FONT size=2>I responded inline. I'd suggest if you tackle this
problem to try to break it up into a few patches to work on seperately.
Ie: the configuration change changes required to get the ring id through the
config change system, and then as a seperate patch the syncronization
code.</FONT></P>
<P><FONT size=2>Thanks</FONT> <BR><FONT size=2>-steve</FONT> </P>
<P><FONT size=2>On Wed, 2005-02-16 at 09:38, Muni Bajpai wrote:</FONT>
<BR><FONT size=2>> Thanks for the quick responses last evening. My
Response/Queries are </FONT><BR><FONT size=2>> inline prepended by a
-------------------</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
Muni</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>> -----Original
Message-----</FONT> <BR><FONT size=2>> From: Steven Dake [<A
href="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT> <BR><FONT
size=2>> Sent: Tuesday, February 15, 2005 6:20 PM</FONT> <BR><FONT
size=2>> To: Bajpai, Muni [NGC:B670:EXCH]; openais@lists.osdl.org</FONT>
<BR><FONT size=2>> Cc: Smith, Kristen [NGC:B675:EXCH];
markh@osdl.org</FONT> <BR><FONT size=2>> Subject: RE: [Openais] Checkpoint
crash in aisexec</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
</FONT><BR><FONT size=2>> Muni</FONT> <BR><FONT size=2>> I hope you dont
mind me copying the openais mailing list so others can </FONT><BR><FONT
size=2>> share in our exchanges.</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> Thanks for taking a look at this</FONT> <BR><FONT
size=2>> </FONT><BR><FONT size=2>> Responses inline</FONT> <BR><FONT
size=2>> </FONT><BR><FONT size=2>> On Tue, 2005-02-15 at 14:54, Muni
Bajpai wrote:</FONT> <BR><FONT size=2>> > Hey Steve,</FONT> <BR><FONT
size=2>> > </FONT><BR><FONT size=2>> > I work with kristen and
need some more info on the checkpoint</FONT> <BR><FONT size=2>>
recovery</FONT> <BR><FONT size=2>> > ...</FONT> <BR><FONT size=2>>
> </FONT><BR><FONT size=2>> > 1.) So the logic for accepting a
configuration change from a</FONT> <BR><FONT size=2>> processor</FONT>
<BR><FONT size=2>> > is :</FONT> <BR><FONT size=2>>
> if ((incoming_ring_id ==
last_known_ring_id) </FONT><BR><FONT size=2>>
>
&& (source_processor != delivering_processor) {</FONT> <BR><FONT
size=2>> > </FONT><BR><FONT size=2>>
>
//IGNORE Change.</FONT> <BR><FONT size=2>>
> }</FONT> <BR><FONT
size=2>> > </FONT><BR><FONT size=2>>
> So as per my
understanding:</FONT> <BR><FONT size=2>>
> 1.) (Ckpt Executive
Perspective) If the change is from ME</FONT> <BR><FONT size=2>> then</FONT>
<BR><FONT size=2>> > always change</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> maybe I was wrong with what I said before.
Try this logic out:</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
If the sync message is from your previous configuration, then the
</FONT><BR><FONT size=2>> reference counts should not be updated because
they would double the </FONT><BR><FONT size=2>> reference counts
incorrectly.</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
------------- So you mean don't care about the source/dest of the sync
</FONT><BR><FONT size=2>> message for decision making of accepting/ignoring
config_chg, just use </FONT><BR><FONT size=2>> the ring_id ?</FONT>
<BR><FONT size=2>> </FONT></P>
<P><FONT size=2>Its not the decision to accept the config change callback, its
the decision to accept the syncronization message. You should always
accept the configuration change callback. But in some cases, the sync
message should be ignored.</FONT></P>
<P><FONT size=2>A member of the synchronization message should be
"previous_ring_id" which is the ring identifier of the ring previous to the
one that is currently undergoing recovery. Keep in mind that it should
be the last regular configuration, not the transitional
configuration.</FONT></P>
<P><FONT size=2>The previous ring id is sufficient to determine if the
refcount increase request would result in an invalid increase.
</FONT></P>
<P><FONT size=2>If they match, then the processor is already aware of the
synchronization contents and should ignore the request. If they dont
match, then the processor is unaware of the syncronization contents and should
accept the request.</FONT></P>
<P><FONT size=2>> ?</FONT> <BR><FONT size=2>> </FONT><BR><FONT
size=2>> ------------- Is it possible to get sync's from 2 different
processors </FONT><BR><FONT size=2>> with the same ring_id ??</FONT>
<BR><FONT size=2>> </FONT></P>
<P><FONT size=2>No this is not possible. </FONT></P>
<P><FONT size=2>The reason is that when determining to send the sync message,
the old ring id's representative is checked against the local ip. If
they match, then the sync message is sent (because this processor is the
representative). If they don't match, no sync message is sent (because
the representative will take care of requesting the synchronization
message).</FONT></P>
<P><FONT size=2>> The sync message is originated from the representative
processor </FONT><BR><FONT size=2>> containing the ring id prior to the
transitional configuration change.</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> When the message is delivered, it is compared to
the ring id prior to </FONT><BR><FONT size=2>> the transitional
configuration. If these two match, then the message </FONT><BR><FONT
size=2>> should be ignored because its a sync message from a processor
within </FONT><BR><FONT size=2>> the prior configuration.</FONT> <BR><FONT
size=2>> </FONT><BR><FONT size=2>>
> 2.) if the ring_id's
don't match then always change.</FONT> <BR><FONT size=2>> >
</FONT><BR><FONT size=2>> </FONT><BR><FONT size=2>> Yes if the ring id
in the delivered sync message doesn't match the </FONT><BR><FONT size=2>>
previous ring id, then add the reference count information for that
</FONT><BR><FONT size=2>> processor to the checkpoint.</FONT> <BR><FONT
size=2>> </FONT><BR><FONT size=2>>
> Please confirm.</FONT>
<BR><FONT size=2>> > </FONT><BR><FONT size=2>> > 2.) We must add
support for the new data structure additions in the</FONT> <BR><FONT
size=2>> > Ckpt Executive Opens and Close handlers also.</FONT>
<BR><FONT size=2>> > </FONT><BR><FONT size=2>> </FONT><BR><FONT
size=2>> no data structures are required in the handler prototypes. I
think we </FONT><BR><FONT size=2>> need a new message vs open and
close. The message should be something </FONT><BR><FONT size=2>> like
"synchronizecounts". I dont want to overload open and close too
</FONT><BR><FONT size=2>> much with extra meaning. We could use this
synchronizecounts for some </FONT><BR><FONT size=2>> other purpose later,
like exchanging metadata too.</FONT> <BR><FONT size=2>> </FONT><BR><FONT
size=2>> ------------ So the ckpt_refcount[MAX_MEMBERS] array is modified
on </FONT><BR><FONT size=2>> the receipt of sync messages,open and
close??</FONT> <BR><FONT size=2>> </FONT></P>
<P><FONT size=2>Yes ckpt_refcount is modified on open, close, and in some
cases on sync given the logic above.</FONT> </P>
<P><FONT size=2>> > 3.) The addition as you enumerated to the checkpoint
data structure, </FONT><BR><FONT size=2>> > did you have any
implementation preferences or did you want us to</FONT> <BR><FONT size=2>>
use</FONT> <BR><FONT size=2>> > anything appropriates (cursively I was
thinking of a list of struct</FONT> <BR><FONT size=2>> > refs)</FONT>
<BR><FONT size=2>> </FONT><BR><FONT size=2>> hmm I have an affinity
towards avoiding any sort of memory allocation </FONT><BR><FONT size=2>> if
at all possible (because they can fail, and this can cause us major
</FONT><BR><FONT size=2>> troubles). Maybe something like struct
ckpt_refcnt {</FONT> <BR><FONT size=2>> </FONT><BR><FONT
size=2>> int count;</FONT>
<BR><FONT size=2>> struct
in_addr addr;</FONT> <BR><FONT size=2>> };</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> Then somethign like adding to
saCkptCheckpoint</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
struct ckpt_refcount ckpt_refcount[MAX_MEMBERS];</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> MAX_MEMBERS should probably be brought out fromt
otemsrp.c into </FONT><BR><FONT size=2>> totemsrp.h and changed from
MAX_MEMBERS to PROCESSOR_COUNT_MAX.</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> > </FONT><BR><FONT size=2>> > 4.) The
last_known_ring_id. What does that mean to a newly added</FONT> <BR><FONT
size=2>> > processor. Explicitly ( incoming_ring_id ==
last_known_ring_id )</FONT> <BR><FONT size=2>> will</FONT> <BR><FONT
size=2>> > always fail on a newly commissioned processor. Am I
understanding</FONT> <BR><FONT size=2>> that</FONT> <BR><FONT size=2>>
> correctly ?</FONT> <BR><FONT size=2>> > </FONT><BR><FONT
size=2>> </FONT><BR><FONT size=2>> no not incoming ring id.
Instead it is the processor's last ring id </FONT><BR><FONT size=2>> in the
originated synchronization message.</FONT> <BR><FONT size=2>>
</FONT><BR><FONT size=2>> last known ring id should be inited to
zero. You understand that the </FONT><BR><FONT size=2>> sync message
will have some value and last_known_ring_id will be zero.</FONT> <BR><FONT
size=2>> </FONT><BR><FONT size=2>> This will force the synchronization
message to be accepted which is </FONT><BR><FONT size=2>> desired
behavior.</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>> > Where
is the last_known_ring_id stored ?</FONT> <BR><FONT size=2>> >
</FONT><BR><FONT size=2>> </FONT><BR><FONT size=2>> it must be stored
when a configuration change is delivered to the </FONT><BR><FONT size=2>>
ckpt_confchg_fn.</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
> 5.) Is exec/evt.c the best example for any ideas on implementation</FONT>
<BR><FONT size=2>> ??</FONT> <BR><FONT size=2>> > </FONT><BR><FONT
size=2>> </FONT><BR><FONT size=2>> I don't think evt uses reference
counting to track channels, but it is </FONT><BR><FONT size=2>> necessary
for checkpoints because of checkpoint retention. I'd rather
</FONT><BR><FONT size=2>> try to invent a few different approaches here so
we can unify them </FONT><BR><FONT size=2>> later once we have discovered
the best design.</FONT> <BR><FONT size=2>> </FONT><BR><FONT size=2>>
Synchronization after a merge or partition is the hardest part of a
</FONT><BR><FONT size=2>> distributed system and I hope we can find a few
approaches to test </FONT><BR><FONT size=2>> out.</FONT> <BR><FONT
size=2>> </FONT><BR><FONT size=2>> > </FONT><BR><FONT size=2>>
> Thanks</FONT> <BR><FONT size=2>> > </FONT><BR><FONT size=2>>
> Muni</FONT> <BR><FONT size=2>> > </FONT><BR><FONT size=2>> >
-----Original Message-----</FONT> <BR><FONT size=2>> > From: Steven Dake
[<A href="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT>
<BR><FONT size=2>> > Sent: Tuesday, February 15, 2005 1:51 PM</FONT>
<BR><FONT size=2>> > To: Smith, Kristen [NGC:B675:EXCH]</FONT> <BR><FONT
size=2>> > Cc: markh@osdl.org; openais@lists.osdl.org; Bajpai, Muni
</FONT><BR><FONT size=2>> > [NGC:B670:EXCH]</FONT> <BR><FONT size=2>>
> Subject: RE: [Openais] Checkpoint crash in aisexec</FONT> <BR><FONT
size=2>> > </FONT><BR><FONT size=2>> > </FONT><BR><FONT
size=2>> > On Tue, 2005-02-15 at 09:47, Kristen Smith wrote:</FONT>
<BR><FONT size=2>> > > Steve,</FONT> <BR><FONT size=2>> > >
</FONT><BR><FONT size=2>> > > Thanks for the response - I hear ya
loud and clear - not good</FONT> <BR><FONT size=2>> > without</FONT>
<BR><FONT size=2>> > > recovery. So, is there something that we could
do to help you with </FONT><BR><FONT size=2>> > > this recovery
coding? If you had some type of design thoughts on</FONT> <BR><FONT
size=2>> how</FONT> <BR><FONT size=2>> > > you wanted checkpoint
recovery to occur, maybe that is something</FONT> <BR><FONT size=2>>
we</FONT> <BR><FONT size=2>> > > could help out with. Just throwing
this out there to see what you</FONT> <BR><FONT size=2>> > >
think.</FONT> <BR><FONT size=2>> > > </FONT><BR><FONT size=2>>
> </FONT><BR><FONT size=2>> > Kristen</FONT> <BR><FONT size=2>>
> You have done alot to help us so far but more help is always</FONT>
<BR><FONT size=2>> > appreciated</FONT> <BR><FONT size=2>> >
:)</FONT> <BR><FONT size=2>> > </FONT><BR><FONT size=2>> > If
someone from your org wanted to get started writing code for</FONT> <BR><FONT
size=2>> > checkpoint recovery that would be great! I spent some
time in the </FONT><BR><FONT size=2>> > drive to work this morning
thinking about how checkpoint recovery </FONT><BR><FONT size=2>> >
should work:</FONT> <BR><FONT size=2>> > </FONT><BR><FONT size=2>>
> There are 3 main steps that should be done in order:</FONT> <BR><FONT
size=2>> > 1. synchronize checkpoint reference counts (so retention
timers work</FONT> <BR><FONT size=2>> > properly)</FONT> <BR><FONT
size=2>> > 2. synchronize checkpoint metadata contents (sizes, sections,
etc)</FONT> <BR><FONT size=2>> 2.</FONT> <BR><FONT size=2>> >
synchronize checkpoint section data contents</FONT> <BR><FONT size=2>> >
</FONT><BR><FONT size=2>> > The place to get started is on the reference
count synchronization.</FONT> <BR><FONT size=2>> > </FONT><BR><FONT
size=2>> > The checkpoint must contain a list of active user's processor
ids</FONT> <BR><FONT size=2>> > along with their reference count.
So if processor A has checkpoint</FONT> <BR><FONT size=2>> 1</FONT>
<BR><FONT size=2>> > open twice, and processor B has checkpoint 1 open
three times, and</FONT> <BR><FONT size=2>> > processor C has checkpoint
1 open four times each processor would </FONT><BR><FONT size=2>> >
maintain a list for the checkpoint (in the checkpoint data</FONT> <BR><FONT
size=2>> structure):</FONT> <BR><FONT size=2>> > </FONT><BR><FONT
size=2>> > p_A:r_2</FONT> <BR><FONT size=2>> > p_B:r_3</FONT>
<BR><FONT size=2>> > p_C:r_4</FONT> <BR><FONT size=2>> >
</FONT><BR><FONT size=2>> > Then on a configuration change, the leaving
processors would close</FONT> <BR><FONT size=2>> > their reference
counts. So in this example, p_B leaves then the </FONT><BR><FONT
size=2>> > processor ref count looks like: p_A:r_2 p_C:r_4</FONT>
<BR><FONT size=2>> > </FONT><BR><FONT size=2>> > During this
configuration change, a processor joins p_D. It has</FONT> <BR><FONT
size=2>> > checkpoint 1 open 1 time. p_D gets a configuration
change {add p_A,</FONT> <BR><FONT size=2>> > p_C} and then sends a
synchronization message with its previous ring</FONT> <BR><FONT size=2>>
> identifier and current list of checkpoint reference counts (after</FONT>
<BR><FONT size=2>> the</FONT> <BR><FONT size=2>> > above leave in the
configuration change was processed). The</FONT> <BR><FONT size=2>>
> representative of {p_A, p_C} also sends a synchronization message</FONT>
<BR><FONT size=2>> with</FONT> <BR><FONT size=2>> > the previous ring
identifier and a current list of checkpoint</FONT> <BR><FONT size=2>> >
reference counts. If the previous ring identifiers match and the
</FONT><BR><FONT size=2>> > sending processor is not the delivering
processor then p_C should </FONT><BR><FONT size=2>> > ignore p_A's
message (ie: p_C receives p_A message, but it already </FONT><BR><FONT
size=2>> > knows about p_A's references).</FONT> <BR><FONT size=2>>
> </FONT><BR><FONT size=2>> > This requires us to add the ring
identifier to the configuration</FONT> <BR><FONT size=2>> >
change.</FONT> <BR><FONT size=2>> > </FONT><BR><FONT size=2>> > So
now each previous configuration is aware of the new</FONT> <BR><FONT
size=2>> configuration.</FONT> <BR><FONT size=2>> > The reference
counts look like:</FONT> <BR><FONT size=2>> > p_A:r_2</FONT> <BR><FONT
size=2>> > p_C:r_4</FONT> <BR><FONT size=2>> > p_D:r_1</FONT>
<BR><FONT size=2>> > </FONT><BR><FONT size=2>> > The above
maintenence of the reference counts, or open checkpoints,</FONT> <BR><FONT
size=2>> > must maintain a per-checkpoint variable which is the
"reference</FONT> <BR><FONT size=2>> count</FONT> <BR><FONT size=2>>
> for this checkpoint". In the last case, that reference count
would</FONT> <BR><FONT size=2>> be</FONT> <BR><FONT size=2>> >
7.</FONT> <BR><FONT size=2>> > </FONT><BR><FONT size=2>> > Each
time a processor leaves, its reference counts are subtracted</FONT> <BR><FONT
size=2>> from</FONT> <BR><FONT size=2>> > this "global ref
count". Each time a processor is added, its</FONT> <BR><FONT size=2>>
> reference counts are added. This reference count is then what
is</FONT> <BR><FONT size=2>> used</FONT> <BR><FONT size=2>> > for
retention duration.</FONT> <BR><FONT size=2>> > </FONT><BR><FONT
size=2>> > Any thoughts on the above approach welcome.</FONT> <BR><FONT
size=2>> > </FONT><BR><FONT size=2>> > Thanks!</FONT> <BR><FONT
size=2>> > -steve</FONT> <BR><FONT size=2>> > </FONT><BR><FONT
size=2>> > > Thanks,</FONT> <BR><FONT size=2>> > >
Kristen</FONT> <BR><FONT size=2>> > > </FONT><BR><FONT size=2>>
> > -----Original Message-----</FONT> <BR><FONT size=2>> > >
From: Steven Dake [<A
href="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT> <BR><FONT
size=2>> > > Sent: Monday, February 14, 2005 2:17 PM</FONT> <BR><FONT
size=2>> > > To: Smith, Kristen [NGC:B675:EXCH];
markh@osdl.org;</FONT> <BR><FONT size=2>> > >
openais@lists.osdl.org</FONT> <BR><FONT size=2>> > > Cc: Bajpai, Muni
[NGC:B670:EXCH]</FONT> <BR><FONT size=2>> > > Subject: RE: [Openais]
Checkpoint crash in aisexec</FONT> <BR><FONT size=2>> > >
</FONT><BR><FONT size=2>> > > </FONT><BR><FONT size=2>> > >
On Sat, 2005-02-12 at 08:08, Kristen Smith wrote:</FONT> <BR><FONT size=2>>
> > > Steve,</FONT> <BR><FONT size=2>> > > >
</FONT><BR><FONT size=2>> > > > Thanks for the response.</FONT>
<BR><FONT size=2>> > > > </FONT><BR><FONT size=2>> > >
> For recovery - what are the ramifications if we don't have</FONT>
<BR><FONT size=2>> > recovery</FONT> <BR><FONT size=2>> > >
> working 100%? What I see now is that when a node leaves the</FONT>
<BR><FONT size=2>> > cluster</FONT> <BR><FONT size=2>> > > >
and then rejoins, it receives evt messages, but it can take</FONT> <BR><FONT
size=2>> > anywhere</FONT> <BR><FONT size=2>> > > > from
15seconds to minutes for evt messages sent from that node</FONT> <BR><FONT
size=2>> to</FONT> <BR><FONT size=2>> > > > reach the other
applications. I handle this with some</FONT> <BR><FONT size=2>> > >
</FONT><BR><FONT size=2>> > > Mark have you seen this issue?</FONT>
<BR><FONT size=2>> > > </FONT><BR><FONT size=2>> > > >
message retries which is ok in this startup case. However, are</FONT>
<BR><FONT size=2>> we</FONT> <BR><FONT size=2>> > in</FONT> <BR><FONT
size=2>> > > > jeopardy in other cases that I am not considering?
When running </FONT><BR><FONT size=2>> > > > traffic the past few
days and seeing periodic reconfigs, I don't</FONT> <BR><FONT size=2>> >
> seem</FONT> <BR><FONT size=2>> > > > to be losing messages
when that occurs - I only see the lost</FONT> <BR><FONT size=2>> > >
messages</FONT> <BR><FONT size=2>> > > > when I actually kill a
node and start it back up to rejoin the</FONT> <BR><FONT size=2>> > >
> cluster.</FONT> <BR><FONT size=2>> > > > </FONT><BR><FONT
size=2>> > > </FONT><BR><FONT size=2>> > > What we have
today is totally unacceptable because atleast for </FONT><BR><FONT size=2>>
> > checkpointing, there is no recovery. And Mark is waiting on
my</FONT> <BR><FONT size=2>> base</FONT> <BR><FONT size=2>> > >
code for event recovery.</FONT> <BR><FONT size=2>> > >
</FONT><BR><FONT size=2>> > > Definition of 100% working means if
there is a failure during </FONT><BR><FONT size=2>> > > recovery, we
are guaranteed a consistent state. I think evt is</FONT> <BR><FONT
size=2>> > pretty</FONT> <BR><FONT size=2>> > > close to this
goal, although the checkpoint replication after</FONT> <BR><FONT size=2>>
merge</FONT> <BR><FONT size=2>> > > has not been developed yet.
I can think of alot of easy ways to</FONT> <BR><FONT size=2>> do</FONT>
<BR><FONT size=2>> > > this, but handling a failure during the
recovery phase makes it</FONT> <BR><FONT size=2>> more</FONT> <BR><FONT
size=2>> > > difficult.</FONT> <BR><FONT size=2>> > >
</FONT><BR><FONT size=2>> > > Definition of almost 100% is that
recovery works properly if there</FONT> <BR><FONT size=2>> > are</FONT>
<BR><FONT size=2>> > > no faults during recovery (ie: the merge
process), but if there is</FONT> <BR><FONT size=2>> a</FONT> <BR><FONT
size=2>> > > fault during recovery (ie: reconfig) something could go
awry.</FONT> <BR><FONT size=2>> > > </FONT><BR><FONT size=2>> >
> We want consistently replicated data (the 100% case). 100% is
</FONT><BR><FONT size=2>> > > probably past your development window;
the other case is within</FONT> <BR><FONT size=2>> > reach.</FONT>
<BR><FONT size=2>> > > </FONT><BR><FONT size=2>> > >
Regards</FONT> <BR><FONT size=2>> > > -steve</FONT> <BR><FONT
size=2>> > > </FONT><BR><FONT size=2>> > > >
Thanks</FONT> <BR><FONT size=2>> > > > Kristen</FONT> <BR><FONT
size=2>> > > > </FONT><BR><FONT size=2>> > > >
-----Original Message-----</FONT> <BR><FONT size=2>> > > > From:
Steven Dake [<A
href="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT> <BR><FONT
size=2>> > > > Sent: Friday, February 11, 2005 5:30 PM</FONT>
<BR><FONT size=2>> > > > To: Smith, Kristen [NGC:B675:EXCH]</FONT>
<BR><FONT size=2>> > > > Subject: RE: [Openais] Checkpoint crash
in aisexec</FONT> <BR><FONT size=2>> > > > </FONT><BR><FONT
size=2>> > > > </FONT><BR><FONT size=2>> > > > Ok well
I doubt with 200 byte checkpoints there is a buffer</FONT> <BR><FONT
size=2>> > > overflow.</FONT> <BR><FONT size=2>> > > >
:)</FONT> <BR><FONT size=2>> > > > </FONT><BR><FONT size=2>>
> > > Recovery will come after 188 is wrapped up. I think your
two</FONT> <BR><FONT size=2>> > weeks</FONT> <BR><FONT size=2>> >
> > window looks good for alpha-level recovery (ie: works most of</FONT>
<BR><FONT size=2>> the</FONT> <BR><FONT size=2>> > > >
time). High quality production recovery will not hit your</FONT>
<BR><FONT size=2>> window</FONT> <BR><FONT size=2>> > > for</FONT>
<BR><FONT size=2>> > > > development (ie: works 100% of the time
no matter what happens).</FONT> <BR><FONT size=2>> > > >
</FONT><BR><FONT size=2>> > > > Thanks</FONT> <BR><FONT
size=2>> > > > -steve</FONT> <BR><FONT size=2>> > > >
</FONT><BR><FONT size=2>> > > > On Fri, 2005-02-11 at 15:56,
Kristen Smith wrote:</FONT> <BR><FONT size=2>> > > > >
Steve,</FONT> <BR><FONT size=2>> > > > > </FONT><BR><FONT
size=2>> > > > > The size of the checkpoints are ~200
bytes.</FONT> <BR><FONT size=2>> > > > > </FONT><BR><FONT
size=2>> > > > > I agree, valgrind is an excellent tool. We
will run it through</FONT> <BR><FONT size=2>> > and</FONT> <BR><FONT
size=2>> > > > see</FONT> <BR><FONT size=2>> > > >
> if that shows anything.</FONT> <BR><FONT size=2>> > > > >
</FONT><BR><FONT size=2>> > > > > I have tried this scenario
maybe 30 times today (for various</FONT> <BR><FONT size=2>> >
other</FONT> <BR><FONT size=2>> > > > > testing) and it
happened maybe 10 times. For a while I could</FONT> <BR><FONT size=2>> >
> > reproduce</FONT> <BR><FONT size=2>> > > > > with a
given test about 5 times and then it hasn't happened</FONT> <BR><FONT
size=2>> > again.</FONT> <BR><FONT size=2>> > > > >
</FONT><BR><FONT size=2>> > > > > Sounds like defect-188 fixing
is going well. May I ask how the </FONT><BR><FONT size=2>> > > >
> recovery work is going as well? (Don't mean to be pushy on</FONT>
<BR><FONT size=2>> that</FONT> <BR><FONT size=2>> > > >
front</FONT> <BR><FONT size=2>> > > > > - we have 2 more weeks
of coding for our application left and</FONT> <BR><FONT size=2>> I</FONT>
<BR><FONT size=2>> > am</FONT> <BR><FONT size=2>> > > > >
really hoping that we are able to put the new recovery code in</FONT>
<BR><FONT size=2>> > > > during</FONT> <BR><FONT size=2>> >
> > > that time).</FONT> <BR><FONT size=2>> > > > >
</FONT><BR><FONT size=2>> > > > > Thanks a bunch,</FONT>
<BR><FONT size=2>> > > > > Kristen</FONT> <BR><FONT size=2>>
> > > > </FONT><BR><FONT size=2>> > > > >
-----Original Message-----</FONT> <BR><FONT size=2>> > > > >
From: Steven Dake [<A
href="mailto:sdake@mvista.com">mailto:sdake@mvista.com</A>]</FONT> <BR><FONT
size=2>> > > > > Sent: Friday, February 11, 2005 4:37 PM</FONT>
<BR><FONT size=2>> > > > > To: Smith, Kristen
[NGC:B675:EXCH]</FONT> <BR><FONT size=2>> > > > > Subject: Re:
[Openais] Checkpoint crash in aisexec</FONT> <BR><FONT size=2>> > >
> > </FONT><BR><FONT size=2>> > > > > </FONT><BR><FONT
size=2>> > > > > how large are the read or write
requests?</FONT> <BR><FONT size=2>> > > > > just a thought
there could be some buffer overrun with larger </FONT><BR><FONT size=2>>
> > > > requests.</FONT> <BR><FONT size=2>> > > > >
</FONT><BR><FONT size=2>> > > > > On Fri, 2005-02-11 at 14:55,
Kristen Smith wrote:</FONT> <BR><FONT size=2>> > > > > >
Steve,</FONT> <BR><FONT size=2>> > > > > > </FONT><BR><FONT
size=2>> > > > > > We are periodically seeing aisexec crash
with the following</FONT> <BR><FONT size=2>> > > trace:</FONT>
<BR><FONT size=2>> > > > > > </FONT><BR><FONT size=2>>
> > > > > (gdb)
bt</FONT> <BR><FONT size=2>> > > > >
> #0
message_handler_req_lib_ckpt_checkpointclose</FONT> <BR><FONT size=2>> >
> > > >
(conn_info=0x0, message=0xb73fc008) at ckpt.c:1552</FONT> <BR><FONT
size=2>> > > > >
> #1 0x080494c2 in
poll_handler_libais_deliver</FONT> <BR><FONT size=2>> >
(handle=0,</FONT> <BR><FONT size=2>> > > > > fd=3,</FONT>
<BR><FONT size=2>> > > > >
> revent=134633824,
data=0x89c2ad8,</FONT> <BR><FONT size=2>> > > > >
>
prio=0x89b2784) at main.c:578</FONT> <BR><FONT size=2>> > > > >
> #2 0x08056e62 in
poll_run (handle=0) at</FONT> <BR><FONT size=2>> aispoll.c:386</FONT>
<BR><FONT size=2>> > > > > > </FONT><BR><FONT size=2>>
> > > > > </FONT><BR><FONT size=2>> > > > > >
#3 0x080499ac in main (argc=1, argv=0xbfffcb64) at</FONT> <BR><FONT
size=2>> > main.c:1003</FONT> <BR><FONT size=2>> > > > >
> </FONT><BR><FONT size=2>> > > > > > We have looked
through the code but can't seem to figure out</FONT> <BR><FONT size=2>>
> how</FONT> <BR><FONT size=2>> > > > > > conn_info is
getting set to 0. Do you have any idea under</FONT> <BR><FONT size=2>>
what</FONT> <BR><FONT size=2>> > > > > > circumstances
conn_info could be null when this function is</FONT> <BR><FONT size=2>>
> > > called?</FONT> <BR><FONT size=2>> > > > > >
</FONT><BR><FONT size=2>> > > > > > This is happening when
we have multiple nodes up and we kill</FONT> <BR><FONT size=2>> >
one</FONT> <BR><FONT size=2>> > > > of</FONT> <BR><FONT
size=2>> > > > > > the active nodes. The standby node (which
was reading</FONT> <BR><FONT size=2>> > > checkpoints)</FONT>
<BR><FONT size=2>> > > > > > must now become a writer, so it
closes the checkpoint and</FONT> <BR><FONT size=2>> this</FONT> <BR><FONT
size=2>> > > > > > happens. Unfortunately, I can't reproduce
this consistently</FONT> <BR><FONT size=2>> -</FONT> <BR><FONT size=2>>
> I</FONT> <BR><FONT size=2>> > > > > > finally got a
core dump today. I don't recall ever seeing</FONT> <BR><FONT size=2>>
this</FONT> <BR><FONT size=2>> > > > with</FONT> <BR><FONT
size=2>> > > > > > the old code.</FONT> <BR><FONT
size=2>> > > > > > </FONT><BR><FONT size=2>> > >
> > > Thanks,</FONT> <BR><FONT size=2>> > > > > >
Kristen</FONT> <BR><FONT size=2>> > > > > > </FONT><BR><FONT
size=2>> > > > > > </FONT><BR><FONT size=2>> > >
> > > </FONT><BR><FONT size=2>> > > > > ></FONT>
<BR><FONT size=2>> > > > ></FONT> <BR><FONT size=2>> >
> ></FONT> <BR><FONT size=2>> > ></FONT> <BR><FONT size=2>>
></FONT> <BR><FONT size=2>>
______________________________________________________________________</FONT>
<BR><FONT size=2>> > > > > >
_______________________________________________</FONT> <BR><FONT size=2>>
> > > > > Openais mailing list</FONT> <BR><FONT size=2>>
> > > > > Openais@lists.osdl.org</FONT> <BR><FONT size=2>>
> > > > <A href="http://lists.osdl.org/mailman/listinfo/openais"
target=_blank>http://lists.osdl.org/mailman/listinfo/openais</A></FONT>
<BR><FONT size=2>> > > > > </FONT><BR><FONT size=2>> >
> > > </FONT><BR><FONT size=2>> > > > </FONT><BR><FONT
size=2>> > > > </FONT><BR><FONT size=2>> > >
</FONT><BR><FONT size=2>> > > </FONT><BR><FONT size=2>> >
</FONT><BR><FONT size=2>> > </FONT><BR><FONT size=2>>
</FONT><BR><FONT size=2>> </FONT></P><BR></BLOCKQUOTE></BODY></HTML>
_______________________________________________
Openais mailing list
Openais@lists.osdl.org
http://lists.osdl.org/mailman/listinfo/openais
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic