[prev in list] [next in list] [prev in thread] [next in thread]
List: drbd-user
Subject: Re: [DRBD-user] drbd fencing stops promotion to master even when network connection is up
From: "Auer, Jens" <jens.auer () cgi ! com>
Date: 2016-09-20 12:12:02
Message-ID: E47848702ADBE04EA7ECF45E05F60B6D0117F6A133 () SE-EX019 ! groupinfra ! com
[Download RAW message or body]
Hi,
I've updated all drbd packages to the latest versions:
MDA1PFP-S01 11:52:35 2551 0 ~ # yum list "*drbd*"
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
drbd.x86_64 \
8.9.8-1.el7 \
@/drbd-8.9.8-1.el7.x86_64 drbd-bash-completion.x86_64 \
8.9.8-1.el7 \
@/drbd-bash-completion-8.9.8-1.el7.x86_64 drbd-heartbeat.x86_64 \
8.9.8-1.el7 \
@/drbd-heartbeat-8.9.8-1.el7.x86_64 drbd-pacemaker.x86_64 \
8.9.8-1.el7 \
@/drbd-pacemaker-8.9.8-1.el7.x86_64 drbd-udev.x86_64 \
8.9.8-1.el7 \
@/drbd-udev-8.9.8-1.el7.x86_64 drbd-utils.x86_64 \
8.9.8-1.el7 \
installed drbd-xen.x86_64 \
8.9.8-1.el7 \
@/drbd-xen-8.9.8-1.el7.x86_64 kmod-drbd.x86_64 \
9.0.4_3.10.0_327.28.3-1.el7 \
@/kmod-drbd-9.0.4_3.10.0_327.28.3-1.el7.x86_64
but this did not fix the problem. The cluster starts fine, but when I stop the node \
with the DRBD master the resource is not promoted on the other node. Here is the test \
I am conducting: 1. start cluster
MDA1PFP-S01 12:07:00 2566 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Tue Sep 20 12:07:24 2016 Last change: Tue Sep 20 12:06:49 2016 by root \
via cibadmin on MDA1PFP-PCS02
Stack: corosync
Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 6 resources configured
Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
Full list of resources:
mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS01
Clone Set: ping-clone [ping]
Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
ACTIVE (ocf::heartbeat:Dummy): Started MDA1PFP-PCS01
Master/Slave Set: drbd1_sync [drbd1]
Masters: [ MDA1PFP-PCS01 ]
Slaves: [ MDA1PFP-PCS02 ]
PCSD Status:
MDA1PFP-PCS01: Online
MDA1PFP-PCS02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
MDA1PFP-S01 12:06:31 2565 0 ~ # drbd-overview
1:shared_fs/0 Connected Primary/Secondary UpToDate/UpToDate
2. stop active cluster node
MDA1PFP-S02 12:08:00 1295 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Tue Sep 20 12:08:17 2016 Last change: Tue Sep 20 12:08:04 2016 by root \
via cibadmin on MDA1PFP-PCS02
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 6 resources configured
Online: [ MDA1PFP-PCS02 ]
OFFLINE: [ MDA1PFP-PCS01 ]
Full list of resources:
mda-ip (ocf::heartbeat:IPaddr2): Started MDA1PFP-PCS02
Clone Set: ping-clone [ping]
Started: [ MDA1PFP-PCS02 ]
Stopped: [ MDA1PFP-PCS01 ]
ACTIVE (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02
Master/Slave Set: drbd1_sync [drbd1]
Slaves: [ MDA1PFP-PCS02 ]
Stopped: [ MDA1PFP-PCS01 ]
PCSD Status:
MDA1PFP-PCS01: Online
MDA1PFP-PCS02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
In the log files I can see that the node actually gets promoted to master, but then \
gets demoted immediately, but I don't see the reason for doing this:
Sep 20 12:08:00 MDA1PFP-S02 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" \
x-pid="3224" x-info="http://www.rsyslog.com"] start
Sep 20 12:08:00 MDA1PFP-S02 rsyslogd-2221: module 'imuxsock' already in this config, \
cannot be added [try http://www.rsyslog.com/e/2221 ]
Sep 20 12:08:00 MDA1PFP-S02 systemd: Stopping System Logging Service...
Sep 20 12:08:00 MDA1PFP-S02 systemd: Starting System Logging Service...
Sep 20 12:08:00 MDA1PFP-S02 systemd: Started System Logging Service.
Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]: notice: Operation ACTIVE_start_0: ok \
(node=MDA1PFP-PCS02, call=29, rc=0, cib-update=21, confirmed=true)
Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=28, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: peer( Primary -> Secondary )
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Adding inet address \
192.168.120.20/32 with broadcast address 192.168.120.255 to device \
bond0
Sep 20 12:08:04 MDA1PFP-S02 avahi-daemon[1084]: Registering new address record for \
192.168.120.20 on bond0.IPv4.
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Bringing device bond0 up
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: \
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p \
/var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto not_used \
not_used
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation mda-ip_start_0: ok \
(node=MDA1PFP-PCS02, call=31, rc=0, cib-update=23, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=32, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=34, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: peer( Secondary -> Unknown ) \
conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: ack_receiver terminated
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Terminating drbd_a_shared_f
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Connection closed
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( TearDown -> Unconnected )
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver terminated
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Restarting receiver thread
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver (re)started
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( Unconnected -> WFConnection \
)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=35, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=36, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: /sbin/drbdadm \
fence-peer shared_fs
Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: invoked for shared_fs
Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: INFO peer is not reachable, my \
disk is UpToDate: placed constraint \
'drbd-fence-by-handler-shared_fs-drbd1_sync'
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: /sbin/drbdadm \
fence-peer shared_fs exit code 5 (0x500)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: fence-peer helper returned 5 \
(peer is unreachable, assumed to be dead)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: pdsk( DUnknown -> Outdated )
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Secondary -> Primary )
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: new current UUID \
098EF9936C4F4D27:5157BB476E60F5AA:6BC19D97CF96E5D2:6BC09D97CF96E5D2
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert \
at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_promote_0: ok \
(node=MDA1PFP-PCS02, call=37, rc=0, cib-update=25, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=38, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Our peer on the DC (MDA1PFP-PCS01) \
is dead
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: State transition S_NOT_DC -> \
S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK \
origin=peer_update_callback ]
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: State transition S_ELECTION -> \
S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED \
origin=election_timeout_popped ]
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: crm_update_peer_proc: Node \
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Removing all MDA1PFP-PCS01 \
attributes for attrd_peer_change_cb
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Lost attribute writer MDA1PFP-PCS01
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Removing MDA1PFP-PCS01/1 from the \
membership list
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]: notice: Purged 1 peers with id=1 and/or \
uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]: notice: crm_update_peer_proc: Node \
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]: notice: Removing MDA1PFP-PCS01/1 from \
the membership list
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]: notice: Purged 1 peers with id=1 \
and/or uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]: notice: crm_update_peer_proc: Node \
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]: notice: Removing MDA1PFP-PCS01/1 from the \
membership list
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]: notice: Purged 1 peers with id=1 and/or \
uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: warning: FSA: Input I_ELECTION_DC from \
do_election_check() received in state S_INTEGRATION
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Notifications disabled
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert \
at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]: notice: On loss of CCM Quorum: Ignore
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]: notice: Demote drbd1:0 (Master -> Slave \
MDA1PFP-PCS02)
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]: notice: Calculated Transition 0: \
/var/lib/pacemaker/pengine/pe-input-1813.bz2
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 55: notify \
drbd1_pre_notify_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=39, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 18: demote \
drbd1_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Primary -> Secondary )
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: bitmap WRITE of 0 pages took 0 \
jiffies
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync by \
on disk bit-map.
Sep 20 12:08:04 MDA1PFP-S02 systemd-udevd: error: /dev/drbd1: Wrong medium type
Sep 20 12:08:04 MDA1PFP-S02 systemd-udevd: error: /dev/drbd1: Wrong medium type
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert \
at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_demote_0: ok \
(node=MDA1PFP-PCS02, call=40, rc=0, cib-update=48, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 56: notify \
drbd1_post_notify_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Operation drbd1_notify_0: ok \
(node=MDA1PFP-PCS02, call=41, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Initiating action 20: monitor \
drbd1_monitor_60000 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: error: pcmkRegisterNode: Triggered assert \
at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: Transition 0 (Complete=10, \
Pending=0, Fired=0, Skipped=0, Incomplete=0, \
Source=/var/lib/pacemaker/pengine/pe-input-1813.bz2): Complete
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: notice: State transition S_TRANSITION_ENGINE \
-> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL \
origin=notify_crmd ]
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: notice: crm_reap_unseen_nodes: Node \
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:05 MDA1PFP-S02 pacemakerd[2335]: notice: crm_reap_unseen_nodes: Node \
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: warning: No match for shutdown action on 1
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: notice: Stonith/shutdown of MDA1PFP-PCS01 \
not matched
Sep 20 12:08:05 MDA1PFP-S02 crmd[2354]: notice: State transition S_IDLE -> \
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph \
]
Sep 20 12:08:05 MDA1PFP-S02 corosync[2244]: [TOTEM ] A new membership \
(192.168.121.11:1452) was formed. Members left: 1
Best wishes,
Jens
--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.auer@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter \
de.cgi.com/pflichtangaben.
CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group \
Inc. and its affiliates may be contained in this message. If you are not a recipient \
indicated or intended in this message (or responsible for delivery of this message to \
such person), or you think for any reason that this message may have been addressed \
to you in error, you may not use or copy or deliver this message to anyone else. In \
such case, you should destroy this message and are asked to notify the sender by \
reply e-mail. _______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic