[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] In xen0, the way to check whether STONITH
From:       Serge Dubrouski <sergeyfd () gmail ! com>
Date:       2009-04-14 22:16:37
Message-ID: 868cbbaa0904141516p29fb28d6q854d690a4842df82 () mail ! gmail ! com
[Download RAW message or body]

Attached is a patch that checks that DomU disappears from the "xm
list" on Dom0 after running destroy.

On Mon, Apr 13, 2009 at 10:03 PM, Serge Dubrouski <sergeyfd@gmail.com> wrote:
> Hello -
>
> This makes sense and I''ll think how to implement that. Thank for the
> suggestion.
>
> 2009/4/13 Yoshihiko SATO <satoyoshi@intellilink.co.jp>:
>> Hi Serge,
>>
>> I consider about the case that two or more plugins are set in cib.xml.
>> For example, xen0(STONITH plugin for DomU) and ibmrsa-telnet(the one for
>> Dom0) or something.
>> The setting's purpose is to STONITH Dom0 when xen0 failed to STONITH DomU.
>> Then, I found the following problem about xen0's fence(off|reset) action.
>>
>> xen0 doesn't check the return code of xm destroy.
>> Instead, it check the target DomU is dead or alive with ping command in
>> CheckIfDead(), right?
>> However, ping does not receive any reply packets at all
>> not only when DomU is normally STONITH'ed but when kernel panic or
>> kernel hang occurs on Dom0.
>> In the case that failure occurs on Dom0, xen0 judges "the fence action
>> succeeded", by mistake.
>> Then, STONITH plugin which is able to STONITH Dom0 (like ibmrsa-telnet
>> etc.) is not executed.
>> So, I consider that it should confirm whether xm destroy via ssh
>> succeeded or not.
>> And it is better to check whether the target is dead with ping only when
>> the command succeeded.
>> If xm destroy is failed, xen0 should return "fence action is failed", I
>> think.
>> What do you think about this?
>> I would like to hear any opinion.
>>
>> Best regards,
>> Yoshihiko SATO
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
>
>
>
> --
> Serge Dubrouski.
>



-- 
Serge Dubrouski.

["xen0.patch" (application/octet-stream)]

--- xen0        2009-03-04 10:08:39.000000000 -0700
+++ xen0.new    2009-04-14 16:12:36.000000000 -0600
@@ -16,6 +16,7 @@
 STOP_COMMAND="xm destroy"
 START_COMMAND="xm create"
 DUMP_COMMAND="xm dump-core"
+CHECK_COMMAND="xm list | grep"
 DEFAULT_XEN_DIR="/etc/xen"
 SSH_COMMAND="/usr/bin/ssh -q -x -n"

@@ -82,20 +83,26 @@

         case $2 in
             stop)
-                 kill_node=`$SSH_COMMAND $dom0 "grep ^[[:space:]]*name $cfg" | cut \
                -f 2 -d '=' |  sed -e 's,",,g'`
-                 if [ "x" = "x$kill_node" ]
-                 then
-                     echo "Couldn't find a node name to stop"
-                     exit 1
-                 fi
-
-                 if [ "x$run_dump" != "x" ]
-                 then
-                     #Need to run core dump
-                     $SSH_COMMAND $dom0 "$DUMP_COMMAND $kill_node >/dev/null 2>&1"
-                 fi
-
-                 $SSH_COMMAND $dom0 "(sleep 2; $STOP_COMMAND $kill_node) >/dev/null \
2>&1 &" +                kill_node=`$SSH_COMMAND $dom0 "grep ^[[:space:]]*name $cfg" \
| cut -f 2 -d '=' |  sed -e 's,",,g'` +                if [ "x" = "x$kill_node" ]
+                then
+                    echo "Couldn't find a node name to stop"
+                    exit 1
+                fi
+
+                if [ "x$run_dump" != "x" ]
+                then
+                    #Need to run core dump
+                    $SSH_COMMAND $dom0 "$DUMP_COMMAND $kill_node >/dev/null 2>&1"
+                fi
+
+                $SSH_COMMAND $dom0 "(sleep 2; $STOP_COMMAND $kill_node) >/dev/null \
2>&1 &" +                if $SSH_COMMAND $dom0 "(sleep 2; ${CHECK_COMMAND} \
^$kill_node) >/dev/null 2>&1" +                then
+                   #Dom0 wasn't able to destroy DomU
+                   echo "xm destroy failed. $kill_node is still active"
+                   exit 1
+                fi
                 break;;
             start)
                 $SSH_COMMAND $dom0 "(sleep 2; $START_COMMAND $cfg) >/dev/null 2>&1 \
&"



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic