[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drbd-user
Subject:    [DRBD-user] Is it normal that we can't directly remove the secondary node when fencing is set?
From:       <mzlld1988 () 163 ! com>
Date:       2016-09-10 6:46:01
Message-ID: 7976b4af.38c6.15712d88101.Coremail.mzlld1988 () 163 ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

[Attachment #4 (text/plain)]

Hi everyone,
I have a question about removing the secondary node of DRBD9.
When fencing is set, is it normal that we can't remove the secondary node of DRBD9, \
but the operation is successful of DRBD8.4.6?

Version of DRBD kernel source is the newest version(9.0.4-1).Version of DRBD utils is \
8.9.6. Description:
    3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing is set.
    I got an error message 'State change failed: (-7) State change was refused by \
peer node' when executing the command 'drbdadm down <res-name>' on any of the \
secondary nodes.

Analysis:
    When executing the down command on one of the secondary nodes.
    The secondary node will execute the methods 'change_cluster_wide_state' of \
drbd_state.c.  change_cluster_wide_state()
    {
        ...
        if (have_peers) {
                if (wait_event_timeout(resource->state_wait,
                               cluster_wide_reply_ready(resource),
                               twopc_timeout(resource))){-------------¢ÙWaiting for \
                peer node to reply, the thread will sleep until the peer node \
                replies.
                    rv = get_cluster_wide_reply(resource);------------¢ÚGet the reply \
info.          }else{
                }
        ...
    }

    Process ¢Ù
        Primary node will execute the following methods.
            ..->try_state_change->is_valid_soft_transition->__is_valid_soft_transition


            Finally,__is_valid_soft_transition will return error code \
SS_PRIMARY_NOP¡£


            if (peer_device->connection->fencing_policy >= FP_RESOURCE &&
                !(role[OLD] == R_PRIMARY && repl_state[OLD] < L_ESTABLISHED && \
                !(peer_disk_state[OLD] <= D_OUTDATED)) &&
                 (role[NEW] == R_PRIMARY && repl_state[NEW] < L_ESTABLISHED && \
!(peer_disk_state[NEW] <= D_OUTDATED)))

                   return SS_PRIMARY_NOP;


            Primary node will set drbd_packet to P_TWOPC_NO, seconday node will get \
the reply to set connection status to TWOPC_NO¡£  At this time,Process ¢Ù will \
finish.


    Process ¢Ú
           rv will be set to SS_CW_FAILED_BY_PEER
        
    ====8.4.6°æ====
        One is primary, the next one is secondary.
        When executing 'drbdadm down <res-name>' on seconday node, the same error \
message will be recorded in the log file for the first time to change the peer disk \
                to D_UNKNOWN¡£
        But the command will succeed by changing peer disk to D_OUTDATED for the \
second time.  
        The following code that report the error.
        is_valid_state()
        {
            ...
            if (fp >= FP_RESOURCE &&
                     ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk >= \
D_UNKNOWN¢Ù){  rv = SS_PRIMARY_NOP;
                     }
            ...
        }
       
        After executing the command 'drbdadm down <res-name>' on secondary node, the \
status of the primary node is:  [root@drbd846 drbd-8.4.6]# cat /proc/drbd
        version: 8.4.6 (api:1/proto:86-101)
        GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by \
root@drbd846.node1, 2016-09-08 08:51:45  0: cs:StandAlone ro:Primary/Unknown \
                ds:UpToDate/Outdated   r-----
            ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f \
oos:0

        The peer disk state is OutDated, not DUnknown.


[Attachment #5 (text/html)]

<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div><span \
style="font-size: 18px;"><b>Hi everyone,</b></span><span style="font-size: \
16px;"><br>I have a question about removing the secondary node of DRBD9.<br>When \
fencing is set, is it normal that we can't remove the secondary node of DRBD9, but \
the operation is successful of DRBD8.4.6?<br></span><ul><li><span style="font-size: \
16px; color: rgb(255, 0, 0);">Version of DRBD kernel source is the newest \
version(9.0.4-1).Version of DRBD utils is 8.9.6.</span></li></ul></div><div><span \
style="font-size: 18px;"><b>Description:</b></span><span style="font-size: \
16px;"></span></div><span style="font-size: 16px;">&nbsp;&nbsp;&nbsp; 3 nodes, one of \
the nodes is primary,disk state is UpToDate.Fencing is set.<br>&nbsp;&nbsp;&nbsp; I \
got an error message '</span><span style="font-size: 16px; color: rgb(255, 0, \
0);">State change failed: (-7) State change was refused by peer node</span><span \
style="font-size: 16px;">' when executing the command 'drbdadm down &lt;res-name&gt;' \
on any of the secondary nodes.<br><br></span><span style="font-size: \
18px;"><b>Analysis:</b></span><span style="font-size: 16px;"><br>&nbsp;&nbsp;&nbsp; \
When executing the down command on one of the secondary nodes.<br>&nbsp;&nbsp;&nbsp; \
The secondary node will execute the methods 'change_cluster_wide_state' of \
drbd_state.c.<br>&nbsp;&nbsp;&nbsp; change_cluster_wide_state()<br>&nbsp;&nbsp;&nbsp; \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
...<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (have_peers) \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
if (wait_event_timeout(resource-&gt;state_wait,<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp \
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
cluster_wide_reply_ready(resource),<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp \
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
twopc_timeout(resource))){-------------¢ÙWaiting for peer node to reply, the thread \
will sleep until the peer node \
replies.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
rv = get_cluster_wide_reply(resource);------------¢ÚGet the reply \
info.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}else{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ...<br>&nbsp;&nbsp;&nbsp; \
}<br><br>&nbsp;&nbsp;&nbsp; Process ¢Ù<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
Primary node will execute the following \
methods.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
..-&gt;try_state_change-&gt;is_valid_soft_transition-&gt;__is_valid_soft_transition<br></span><div><span \
style="font-size: 16px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
Finally,__is_valid_soft_transition will return error code \
SS_PRIMARY_NOP¡£</span></div><div><span style="font-size: \
16px;"></span><br></div><span style="font-size: \
16px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if \
(peer_device-&gt;connection-&gt;fencing_policy &gt;= FP_RESOURCE \
&amp;&amp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
!(role[OLD] == R_PRIMARY &amp;&amp; repl_state[OLD] &lt; L_ESTABLISHED &amp;&amp; \
!(peer_disk_state[OLD] &lt;= D_OUTDATED)) \
&amp;&amp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(role[NEW] == R_PRIMARY &amp;&amp; repl_state[NEW] &lt; L_ESTABLISHED &amp;&amp; \
!(peer_disk_state[NEW] &lt;= D_OUTDATED)))<br></span><div><span style="font-size: \
16px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
return SS_PRIMARY_NOP;</span></div><div><span style="font-size: \
16px;"></span><br></div><div><span style="font-size: \
16px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Primary \
node will set drbd_packet to P_TWOPC_NO, seconday node will get the reply to set \
connection status to \
TWOPC_NO¡£<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; At \
this time,Process ¢Ù will finish. <br><br></span></div><span style="font-size: \
16px;">&nbsp;&nbsp;&nbsp; Process ¢Ú<br>&nbsp;&nbsp; \
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rv will be set to \
SS_CW_FAILED_BY_PEER<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
&nbsp;<br>&nbsp;&nbsp;&nbsp; \
====8.4.6°æ====<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; One is primary, the \
next one is secondary.<br>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; When executing \
'drbdadm down &lt;res-name&gt;' on seconday node, the same error message will be \
recorded in the log file for the first time to change the peer disk to \
D_UNKNOWN¡£<br>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; But the command will succeed by \
changing peer disk to D_OUTDATED for the second \
time.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;<br>&nbsp;&nbsp; \
&nbsp;&nbsp;&nbsp;&nbsp; The following code that report the error.<br>&nbsp;&nbsp; \
&nbsp; &nbsp;&nbsp; is_valid_state()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
...<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (fp \
&gt;= FP_RESOURCE &amp;&amp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
ns.role == R_PRIMARY &amp;&amp; ns.conn &lt; C_CONNECTED &amp;&amp; ns.pdsk &gt;= \
D_UNKNOWN¢Ù){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
rv = SS_PRIMARY_NOP;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
...<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
&nbsp;<br>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; After executing the command 'drbdadm \
down &lt;res-name&gt;' on secondary node, the status of the primary node \
is:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [root@drbd846 drbd-8.4.6]# cat \
/proc/drbd<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; version: 8.4.6 \
(api:1/proto:86-101)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GIT-hash: \
833d830e0152d1e457fa7856e71e11248ccf3f70 build by root@drbd846.node1, 2016-09-08 \
08:51:45<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0: cs:StandAlone \
ro:Primary/Unknown ds:UpToDate/</span><span style="font-size: 16px; color: rgb(255, \
0, 0);">Outdated</span><span style="font-size: 16px;">&nbsp;&nbsp; \
r-----<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f \
oos:0<br><br>&nbsp;&nbsp; &nbsp;</span><span style="font-size: 16px; color: rgb(255, \
0, 0);"><b>&nbsp;&nbsp;&nbsp; The peer disk state is OutDated, not \
DUnknown.</b></span></div><br><br><span title="neteasefooter"><p>&nbsp;</p></span>



_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic