[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    RE: [Linux-ha-dev] Membership instance ID went backwards
From:       "Junko IKEDA" <ikedaj () intellilink ! co ! jp>
Date:       2007-08-28 10:13:42
Message-ID: 008c01c7e95c$1c8cefd0$251a1eac () intellilink ! co ! jp
[Download RAW message or body]

Hi,

I'm sorry that there has been a misunderstanding.
in this case, instance ID should go backwards because Split-Brain happened.

What I concern is about CCM behavior, it seems to be discussed at Bug #1138.
After interconnect LAN was down, CCM seems to wait a while to decide its
next status.
During this wait time, resource would run on the both node.
Is this the expected behavior for quorumd?

heartbeat[12731]: 2007/08/17_11:26:29 WARN: node prec370e: is dead
heartbeat[12731]: 2007/08/17_11:26:29 info: Link prec370e:eth2 dead.
ccm[12739]: 2007/08/17_11:26:29 debug: recv msg status from prec370e,
status:dead
ccm[12739]: 2007/08/17_11:26:29 debug: status of node prec370e: active ->
dead
ccm[12739]: 2007/08/17_11:26:29 debug: recv msg CCM_TYPE_LEAVE from
prec370e, status:[null ptr]
ccm[12739]: 2007/08/17_11:26:29 debug: send msg CCM_TYPE_JOIN to cluster,
status:[null]
ccm[12739]: 2007/08/17_11:26:29 debug: node state CCM_STATE_JOINED ->
CCM_STATE_JOINING
cib[12740]: 2007/08/17_11:26:29 info: mem_handle_event: Got an event
OC_EV_MS_NOT_PRIMARY from ccm
crmd[12744]: 2007/08/17_11:26:29 notice: crmd_ha_status_callback: Status
update: Node prec370e now has status [dead]
cib[12740]: 2007/08/17_11:26:29 info: mem_handle_event: instance=2, nodes=2,
new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
crmd[12744]: 2007/08/17_11:26:29 info: mem_handle_event: Got an event
OC_EV_MS_NOT_PRIMARY from ccm
cib[12740]: 2007/08/17_11:26:29 debug: cib_ccm_msg_callback: Process CCM
event=NOT PRIMARY (id=2)
crmd[12744]: 2007/08/17_11:26:29 info: mem_handle_event: instance=2,
nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
crmd[12744]: 2007/08/17_11:26:29 info: crmd_ccm_msg_callback: Quorum lost
after event=NOT PRIMARY (id=2)
mgmtd[12745]: 2007/08/17_11:26:29 debug: update cib finished
ccm[12739]: 2007/08/17_11:26:30 debug: recv msg CCM_TYPE_JOIN from prec370d,
status:[null ptr]
ccm[12739]: 2007/08/17_11:26:30 debug: send msg CCM_TYPE_REQ_MEMLIST to
cluster, status:[null]
ccm[12739]: 2007/08/17_11:26:30 debug: node state CCM_STATE_JOINING ->
CCM_STATE_SENT_MEMLISTREQ
ccm[12739]: 2007/08/17_11:26:30 debug: recv msg CCM_TYPE_REQ_MEMLIST from
prec370d, status:[null ptr]


Dummy[12898][12904]: 2007/08/17_11:26:33 DEBUG: prmDummy monitor : 0
lrmd[12741]: 2007/08/17_11:26:33 info: Exiting prmDummy:monitor process
12898 returned rc 0.
Dummy[12905][12911]: 2007/08/17_11:26:43 DEBUG: prmDummy monitor : 0
lrmd[12741]: 2007/08/17_11:26:43 info: Exiting prmDummy:monitor process
12905 returned rc 0.
Dummy[12912][12918]: 2007/08/17_11:26:53 DEBUG: prmDummy monitor : 0
lrmd[12741]: 2007/08/17_11:26:53 info: Exiting prmDummy:monitor process
12912 returned rc 0.

*** waiting some timeout ? ***

ccm[12739]: 2007/08/17_11:26:54 debug: quorum plugin: quorumd,
quorumd_init()
ccm[12739]: 2007/08/17_11:26:54 debug: quorum plugin: cluster:xxx,
quorum_server:sl000237
ccm[12739]: 2007/08/17_11:26:54 debug: quorum plugin: quorumd
ccm[12739]: 2007/08/17_11:26:54 debug: cluster:xxx, member_count=1,
member_quorum_votes=100
ccm[12739]: 2007/08/17_11:26:54 debug: total_node_count=2,
total_quorum_votes=200
ccm[12739]: 2007/08/17_11:26:54 debug: quorum plugin: quorumd,
connect_quorum_server
ccm[12739]: 2007/08/17_11:26:54 debug: zhenh: return cur_quorum  0
ccm[12739]: 2007/08/17_11:26:54 debug: send msg CCM_TYPE_FINAL_MEMLIST to
cluster, status:[null]
cib[12740]: 2007/08/17_11:26:54 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm

Thanks,
Junko

> -----Original Message-----
> From: linux-ha-dev-bounces@lists.linux-ha.org
> [mailto:linux-ha-dev-bounces@lists.linux-ha.org] On Behalf Of Dejan
> Muhamedagic
> Sent: Monday, August 27, 2007 8:56 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] Membership instance ID went backwards
> 
> On Mon, Aug 27, 2007 at 11:40:27AM +0900, Junko IKEDA wrote:
> > Hi,
> >
> > I tried to run quorumd server to rewrite SF-EX as a quorum module.
> > quorumd seemed to work well, but there was something confusion of
instance
> > ID.
> > to make matters worse, it led to a flash Split-Brain.
> 
> What seems to have led to the split-brain is this:
> 
> heartbeat[19616]: 2007/08/17_11:26:29 WARN: node prec370d: is dead
> heartbeat[19616]: 2007/08/17_11:26:29 info: Link prec370d:eth2 dead.
> 
> Is that what you meant?
> 
> > ha-debug said;
> > crmd[19630]: 2007/08/17_11:28:16 ERROR: crmd_ccm_msg_callback:
Membership
> > instance ID went backwards! 3->1
> > crmd[19630]: 2007/08/17_11:28:16 ERROR: crm_abort:
crmd_ccm_msg_callback:
> > Triggered fatal assert at callbacks.c:526 : current_ccm_membership_id <=
> > membership->m_instance
> >
> > is it the same issue as bug #1546 ?
> 
> Yes. crmd dumps core because the instance goes backwards.
> 
> > Best Regards,
> > Junko Ikeda
> >
> > NTT DATA INTELLILINK CORPORATION
> >
> > Toyosu Center Building Annex, 3-3-9, Toyosu,
> > Koto-ku, Tokyo 135-0061, Japan
> > TEL : +81-3-3534-4811
> > FAX : +81-3-3534-4814
> > mailto:ikedaj@intellilink.co.jp
> > http://www.intellilink.co.jp/
> >
> 
> 
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> 
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

["ha-debug" (application/octet-stream)]

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic