[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openais
Subject:    RE: [Openais] muni try this 932 take 8
From:       Steven Dake <scd () broked ! org>
Date:       2005-11-23 22:32:33
Message-ID: 1132785153.13091.40.camel () slickdeal ! broked ! org
[Download RAW message or body]

Muni
This patch has been committed at revision 850 and ported to picacho at
revision 851.

On Tue, 2005-11-22 at 09:56, Muni Bajpai wrote:
> Hey Steve,
> 
> So after some intense thinking :) I don't think it is possible that the
> index can be out of bounds. 
> 
> I think the real issue here is the fact that it is possible to remove an
> element from the list via checkpoint_release while iterating through the
> list. I think that might be the issue here.
> 
> So I think we should separate the cleanup from this iteration. It adds
> one more cycle of iteration but is safer.
> 
> Please review the patch
> 
> Thanks
> 
> Muni
> 
> -----Original Message-----
> From: Steven Dake [mailto:sdake@mvista.com] 
> Sent: Monday, November 21, 2005 12:57 PM
> To: Bajpai, Muni [RICH1:B670:EXCH]
> Cc: Smith, Kristen [RICH1:B670:EXCH]
> Subject: RE: [Openais] muni try this 932 take 8
> 
> Muni,
> ckpt_confchg_fn is indeed called with TOTEM_CONFIGURATION_TRANSITIONAL.
> look at the previous call traces, they all call, and the only way to get
> into ckpt_recovery_process_members_exit is with a transitional
> configuration.
> 
> Then note that ckpt_recovery_process_members_exit is called.  I suspect
> that this function is in some way corrupting the stack including
> left_list and configuration_type from the previous stack frame.
> 
> Is it possible the call
>        memset((char*)&checkpoint->ckpt_refcount[index].addr, 0,
> sizeof(struct in_addr));
> 
> is called with an index either negative or greater then the size of
> ckpt_refcount?  This would explain that other refcounting segfault bug I
> found.  I suggest putting an assert before that memset to see if the
> code is behaving out of your expectations.
> 
> Were you running with RANDOM_DROP set?
> 
> Regards
> -steve
> 
> On Mon, 2005-11-21 at 10:05 -0600, Muni Bajpai wrote:
> > Steve,
> > 
> > This just doesn't make sense. So a segfault happened in the ckpt
> service
> > after about 6 hours of traffic
> > 
> > Things to note in the following trace
> > 1.) ckpt_recovery_process_members_exit is called from ckpt_confchg_fn
> > ONLY 	if  configuration_type==TOTEM_CONFIGURATION_TRANSITIONAL which
> > according to the trace is NOT ??????????????? It's a simple if check.
> > 2.) The left_list pointer has changed from #1 to #0 ?????
> > 
> > See the left_list pointer is clobbered even though there is no
> > manipulation of that pointer in ckpt_confchg_fn. 
> > 
> > The left_list array is initialized in memb_state_operational_enter and
> > has functional scope hence will not change until the call returns.
> > 
> > I dunno how this is possible  ?
> > 
> > Thanks
> > 
> > Muni
> > 
> > #0  ckpt_recovery_process_members_exit (left_list=0x8394,
> > left_list_entries=3) at ckpt.c:566
> > #1  0x08053795 in ckpt_confchg_fn
> > (configuration_type=TOTEM_CONFIGURATION_REGULAR,
> member_list=0x80bdea0,
> > member_list_entries=1, left_list=0xbfffd360,
> >     left_list_entries=3, joined_list=0x0, joined_list_entries=0,
> > ring_id=0xd9bf682f) at ckpt.c:1127
> > #2  0x0804ab2e in confchg_fn
> > (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL,
> > member_list=0x80bdea0, member_list_entries=1, left_list=0xbfffd360,
> >     left_list_entries=3, joined_list=0x0, joined_list_entries=0,
> > ring_id=0x80bdf78) at main.c:903
> > #3  0x08066397 in totempg_confchg_fn
> > (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL,
> > member_list=0x80bdea0, member_list_entries=1,
> >     left_list=0xbfffd360, left_list_entries=3, joined_list=0x0,
> > joined_list_entries=0, ring_id=0x80bdf78) at totempg.c:239
> > #4  0x0806856d in totemmrp_confchg_fn
> > (configuration_type=TOTEM_CONFIGURATION_TRANSITIONAL,
> > member_list=0x80bdea0, member_list_entries=1,
> >     left_list=0xbfffd360, left_list_entries=3, joined_list=0x0,
> > joined_list_entries=0, ring_id=0x80bdf78) at totemmrp.c:94
> > #5  0x08063b40 in memb_state_operational_enter (instance=0x80bdd48) at
> > totemsrp.c:1392
> > #6  0x0805ef7f in message_handler_orf_token (instance=0x80bdd48,
> > system_from=0xbfffe174, msg=0x80d787c, msg_len=42,
> > endian_conversion_needed=0)
> >     at totemsrp.c:2971
> > #7  0x0806144a in main_deliver_fn (context=0x80bdd48,
> > system_from=0xbfffe174, msg=0x80d787c, msg_len=34024) at
> totemsrp.c:3653
> > #8  0x08067e49 in active_token_recv (instance=0x80bd158,
> interface_no=0,
> > context=0x80bdd48, system_from=0xbfffe174, msg=0x80d787c, msg_len=42,
> >     token_seqid=0) at totemrrp.c:482
> > #9  0x08067f78 in rrp_deliver_fn (context=0x80bd220,
> > system_from=0xbfffe174, msg=0x80d787c, msg_len=42) at totemrrp.c:542
> > #10 0x08069b04 in net_deliver_fn (handle=0, fd=4, revents=1,
> > data=0x80d7250, prio=0x0) at totemnet.c:688
> > #11 0x0805de25 in poll_run (handle=0) at aispoll.c:433
> > #12 0x0804a243 in main (argc=1, argv=0xbfffe3f4) at main.c:1200
> > 
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake@mvista.com] 
> > Sent: Sunday, November 20, 2005 9:48 PM
> > To: Bajpai, Muni [RICH1:B670:EXCH]
> > Cc: Smith, Kristen [RICH1:B670:EXCH]
> > Subject: RE: [Openais] muni try this 932 take 6
> > 
> > The protocol code in picacho and trunk are the same so testing with
> > either should be fine.  The reason I couldn't reproduce your issues is
> > that I wasn't trying hard enough :)
> > 
> > Regards
> > -steve
> > On Sun, 2005-11-20 at 01:24 -0600, Muni Bajpai wrote:
> > > Steve Thanks for the effort
> > > 
> > > The reason I bought up the picacho vs trunk issue is so that we are
> on
> > > the same page as we saw last week that there were some asserts you
> > > couldn't reproduce readily with you being on trunk and me being on
> > > picacho.
> > > 
> > > The problem is that I can start testing with one and switch to
> another
> > > as time unfortunately is getting thinner with our release deadline
> > > approaching.
> > > 
> > > So that's the spill.
> > > 
> > > I just saw that u posted release 8. does this include the patch for
> > 969
> > > ?
> > > 
> > > P.S will update the man pages further for defect 968.
> > > 
> > > Thanks
> > > 
> > > Muni
> > > -----Original Message-----
> > > From: Steven Dake [mailto:sdake@mvista.com] 
> > > Sent: Saturday, November 19, 2005 1:14 PM
> > > To: Bajpai, Muni [RICH1:B670:EXCH]
> > > Cc: Smith, Kristen [RICH1:B670:EXCH]
> > > Subject: RE: [Openais] muni try this 932 take 6
> > > 
> > > Muni
> > > I should have a new patch coming soon.  The one I sent has a couple
> > bugs
> > > i found last night.
> > > 
> > > All patches are against trunk, but should apply to picacho without
> > much
> > > trouble.
> > > 
> > > I've been running for about 4 hours now without an assertion with
> > random
> > > drop on..
> > > 
> > > Seems positive atleast :)
> > > 
> > > On Sat, 2005-11-19 at 12:38 -0600, Muni Bajpai wrote:
> > > > Steve,
> > > > 
> > > > So Should I apply/test this patch to trunk or picacho ? (Need to
> > know
> > > > which one to start testing with)
> > > > Also should I also apply 969 as that seems relevant ?
> > > > 
> > > > Thanks
> > > > 
> > > > Muni
> > > > 
> > > > -----Original Message-----
> > > > From: openais-bounces@lists.osdl.org
> > > > [mailto:openais-bounces@lists.osdl.org] On Behalf Of Steven Dake
> > > > Sent: Friday, November 18, 2005 2:48 PM
> > > > To: openais@lists.osdl.org
> > > > Subject: [Openais] muni try this 932 take 6
> > > > 
> > > > 
> > > > Muni
> > > > I've worked on the 932 patch and found another bug.  I seem to be
> > > > working better now for 3 nodes.
> > > > 
> > > > I have noticed one bug which I'm not sure how I will fix.
> Basically
> > > > during the RECOVERY phase, too m any messages are in the recovery
> > > queue
> > > > overflowing it.  These messages should be processed at one time.
> > > > 
> > > > Please let me know if you have asserts and the asserts you see.
> > > > 
> > > > I'll run this all weekend and work to get any problems sorted out
> > over
> > > > the weekend.
> > > > 
> > > > Regards
> > > > -steev
> > > 
> > > 
> > 
> > 
> 
> 
> 
> ______________________________________________________________________
> _______________________________________________
> Openais mailing list
> Openais@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/openais



_______________________________________________
Openais mailing list
Openais@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/openais


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic