[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drbd-user
Subject:    Re: [DRBD-user] drbd fencing stops promotion to master even when network connection is up
From:       "Auer, Jens" <jens.auer () cgi ! com>
Date:       2016-09-20 12:12:02
Message-ID: E47848702ADBE04EA7ECF45E05F60B6D0117F6A133 () SE-EX019 ! groupinfra ! com
[Download RAW message or body]

Hi,

I've updated all drbd packages to the latest versions:
MDA1PFP-S01 11:52:35 2551 0 ~ # yum list "*drbd*"
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
drbd.x86_64                                                                           \
8.9.8-1.el7                                                                           \
@/drbd-8.9.8-1.el7.x86_64                      drbd-bash-completion.x86_64            \
8.9.8-1.el7                                                                           \
@/drbd-bash-completion-8.9.8-1.el7.x86_64      drbd-heartbeat.x86_64                  \
8.9.8-1.el7                                                                           \
@/drbd-heartbeat-8.9.8-1.el7.x86_64            drbd-pacemaker.x86_64                  \
8.9.8-1.el7                                                                           \
@/drbd-pacemaker-8.9.8-1.el7.x86_64            drbd-udev.x86_64                       \
8.9.8-1.el7                                                                           \
@/drbd-udev-8.9.8-1.el7.x86_64                 drbd-utils.x86_64                      \
8.9.8-1.el7                                                                           \
installed                                      drbd-xen.x86_64                        \
8.9.8-1.el7                                                                           \
@/drbd-xen-8.9.8-1.el7.x86_64                  kmod-drbd.x86_64                       \
9.0.4_3.10.0_327.28.3-1.el7                                                           \
@/kmod-drbd-9.0.4_3.10.0_327.28.3-1.el7.x86_64

but this did not fix the problem. The cluster starts fine, but when I stop the node \
with the DRBD master the resource is not promoted on the other node. Here is the test \
I am conducting: 1. start cluster
MDA1PFP-S01 12:07:00 2566 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Tue Sep 20 12:07:24 2016		Last change: Tue Sep 20 12:06:49 2016 by root \
                via cibadmin on MDA1PFP-PCS02
Stack: corosync
Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 6 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

Full list of resources:

 mda-ip	(ocf::heartbeat:IPaddr2):	Started MDA1PFP-PCS01
 Clone Set: ping-clone [ping]
     Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
 ACTIVE	(ocf::heartbeat:Dummy):	Started MDA1PFP-PCS01
 Master/Slave Set: drbd1_sync [drbd1]
     Masters: [ MDA1PFP-PCS01 ]
     Slaves: [ MDA1PFP-PCS02 ]

PCSD Status:
  MDA1PFP-PCS01: Online
  MDA1PFP-PCS02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

MDA1PFP-S01 12:06:31 2565 0 ~ # drbd-overview 
 1:shared_fs/0  Connected Primary/Secondary UpToDate/UpToDate 

2. stop active cluster node
MDA1PFP-S02 12:08:00 1295 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Tue Sep 20 12:08:17 2016		Last change: Tue Sep 20 12:08:04 2016 by root \
                via cibadmin on MDA1PFP-PCS02
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 6 resources configured

Online: [ MDA1PFP-PCS02 ]
OFFLINE: [ MDA1PFP-PCS01 ]

Full list of resources:

 mda-ip	(ocf::heartbeat:IPaddr2):	Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
     Started: [ MDA1PFP-PCS02 ]
     Stopped: [ MDA1PFP-PCS01 ]
 ACTIVE	(ocf::heartbeat:Dummy):	Started MDA1PFP-PCS02
 Master/Slave Set: drbd1_sync [drbd1]
     Slaves: [ MDA1PFP-PCS02 ]
     Stopped: [ MDA1PFP-PCS01 ]

PCSD Status:
  MDA1PFP-PCS01: Online
  MDA1PFP-PCS02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

In the log files I can see that the node actually gets promoted to master, but then \
                gets demoted immediately, but I don't see the reason for doing this:
Sep 20 12:08:00 MDA1PFP-S02 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" \
                x-pid="3224" x-info="http://www.rsyslog.com"] start
Sep 20 12:08:00 MDA1PFP-S02 rsyslogd-2221: module 'imuxsock' already in this config, \
cannot be added  [try http://www.rsyslog.com/e/2221 ]
Sep 20 12:08:00 MDA1PFP-S02 systemd: Stopping System Logging Service...
Sep 20 12:08:00 MDA1PFP-S02 systemd: Starting System Logging Service...
Sep 20 12:08:00 MDA1PFP-S02 systemd: Started System Logging Service.
Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]:  notice: Operation ACTIVE_start_0: ok \
                (node=MDA1PFP-PCS02, call=29, rc=0, cib-update=21, confirmed=true)
Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=28, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: peer( Primary -> Secondary ) 
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Adding inet address \
                192.168.120.20/32 with broadcast address 192.168.120.255 to device \
                bond0
Sep 20 12:08:04 MDA1PFP-S02 avahi-daemon[1084]: Registering new address record for \
                192.168.120.20 on bond0.IPv4.
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Bringing device bond0 up
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: \
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p \
/var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto not_used \
                not_used
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation mda-ip_start_0: ok \
                (node=MDA1PFP-PCS02, call=31, rc=0, cib-update=23, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=32, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=34, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: peer( Secondary -> Unknown ) \
                conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: ack_receiver terminated
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Terminating drbd_a_shared_f
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Connection closed
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( TearDown -> Unconnected ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver terminated
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Restarting receiver thread
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver (re)started
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( Unconnected -> WFConnection \
                ) 
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=35, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=36, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: /sbin/drbdadm \
                fence-peer shared_fs
Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: invoked for shared_fs
Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: INFO peer is not reachable, my \
                disk is UpToDate: placed constraint \
                'drbd-fence-by-handler-shared_fs-drbd1_sync'
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: /sbin/drbdadm \
                fence-peer shared_fs exit code 5 (0x500)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: fence-peer helper returned 5 \
                (peer is unreachable, assumed to be dead)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: pdsk( DUnknown -> Outdated ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Secondary -> Primary ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: new current UUID \
                098EF9936C4F4D27:5157BB476E60F5AA:6BC19D97CF96E5D2:6BC09D97CF96E5D2
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered assert \
                at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_promote_0: ok \
                (node=MDA1PFP-PCS02, call=37, rc=0, cib-update=25, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=38, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Our peer on the DC (MDA1PFP-PCS01) \
                is dead
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition S_NOT_DC -> \
S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK \
                origin=peer_update_callback ]
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition S_ELECTION -> \
S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED \
                origin=election_timeout_popped ]
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: crm_update_peer_proc: Node \
                MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Removing all MDA1PFP-PCS01 \
                attributes for attrd_peer_change_cb
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Lost attribute writer MDA1PFP-PCS01
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Removing MDA1PFP-PCS01/1 from the \
                membership list
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Purged 1 peers with id=1 and/or \
                uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]:  notice: crm_update_peer_proc: Node \
                MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]:  notice: Removing MDA1PFP-PCS01/1 from \
                the membership list
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]:  notice: Purged 1 peers with id=1 \
                and/or uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]:  notice: crm_update_peer_proc: Node \
                MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]:  notice: Removing MDA1PFP-PCS01/1 from the \
                membership list
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]:  notice: Purged 1 peers with id=1 and/or \
                uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: warning: FSA: Input I_ELECTION_DC from \
                do_election_check() received in state S_INTEGRATION
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Notifications disabled
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered assert \
                at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]:  notice: On loss of CCM Quorum: Ignore
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]:  notice: Demote  drbd1:0	(Master -> Slave \
                MDA1PFP-PCS02)
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]:  notice: Calculated Transition 0: \
                /var/lib/pacemaker/pengine/pe-input-1813.bz2
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Initiating action 55: notify \
                drbd1_pre_notify_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=39, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Initiating action 18: demote \
                drbd1_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Primary -> Secondary ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: bitmap WRITE of 0 pages took 0 \
                jiffies
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync by \
                on disk bit-map.
Sep 20 12:08:04 MDA1PFP-S02 systemd-udevd: error: /dev/drbd1: Wrong medium type
Sep 20 12:08:04 MDA1PFP-S02 systemd-udevd: error: /dev/drbd1: Wrong medium type
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered assert \
                at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_demote_0: ok \
                (node=MDA1PFP-PCS02, call=40, rc=0, cib-update=48, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Initiating action 56: notify \
                drbd1_post_notify_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok \
                (node=MDA1PFP-PCS02, call=41, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Initiating action 20: monitor \
                drbd1_monitor_60000 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered assert \
                at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Transition 0 (Complete=10, \
Pending=0, Fired=0, Skipped=0, Incomplete=0, \
                Source=/var/lib/pacemaker/pengine/pe-input-1813.bz2): Complete
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition S_TRANSITION_ENGINE \
                -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL \
                origin=notify_crmd ]
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]:  notice: crm_reap_unseen_nodes: Node \
                MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:05 MDA1PFP-S02 pacemakerd[2335]:  notice: crm_reap_unseen_nodes: Node \
                MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: warning: No match for shutdown action on 1
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]:  notice: Stonith/shutdown of MDA1PFP-PCS01 \
                not matched
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]:  notice: State transition S_IDLE -> \
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph \
                ]
Sep 20 12:08:05 MDA1PFP-S02 corosync[2244]: [TOTEM ] A new membership \
(192.168.121.11:1452) was formed. Members left: 1

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.auer@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter \
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group \
Inc. and its affiliates may be contained in this message. If you are not a recipient \
indicated or intended in this message (or responsible for delivery of this message to \
such person), or you think for any reason that this message may have been addressed \
to you in error, you may not use or copy or deliver this message to anyone else. In \
such case, you should destroy this message and are asked to notify the sender by \
reply e-mail. _______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic