[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-ha-dev
Subject: Re: [Linux-ha-dev] In xen0, the way to check whether STONITH
From: Serge Dubrouski <sergeyfd () gmail ! com>
Date: 2009-04-14 22:16:37
Message-ID: 868cbbaa0904141516p29fb28d6q854d690a4842df82 () mail ! gmail ! com
[Download RAW message or body]
Attached is a patch that checks that DomU disappears from the "xm
list" on Dom0 after running destroy.
On Mon, Apr 13, 2009 at 10:03 PM, Serge Dubrouski <sergeyfd@gmail.com> wrote:
> Hello -
>
> This makes sense and I''ll think how to implement that. Thank for the
> suggestion.
>
> 2009/4/13 Yoshihiko SATO <satoyoshi@intellilink.co.jp>:
>> Hi Serge,
>>
>> I consider about the case that two or more plugins are set in cib.xml.
>> For example, xen0(STONITH plugin for DomU) and ibmrsa-telnet(the one for
>> Dom0) or something.
>> The setting's purpose is to STONITH Dom0 when xen0 failed to STONITH DomU.
>> Then, I found the following problem about xen0's fence(off|reset) action.
>>
>> xen0 doesn't check the return code of xm destroy.
>> Instead, it check the target DomU is dead or alive with ping command in
>> CheckIfDead(), right?
>> However, ping does not receive any reply packets at all
>> not only when DomU is normally STONITH'ed but when kernel panic or
>> kernel hang occurs on Dom0.
>> In the case that failure occurs on Dom0, xen0 judges "the fence action
>> succeeded", by mistake.
>> Then, STONITH plugin which is able to STONITH Dom0 (like ibmrsa-telnet
>> etc.) is not executed.
>> So, I consider that it should confirm whether xm destroy via ssh
>> succeeded or not.
>> And it is better to check whether the target is dead with ping only when
>> the command succeeded.
>> If xm destroy is failed, xen0 should return "fence action is failed", I
>> think.
>> What do you think about this?
>> I would like to hear any opinion.
>>
>> Best regards,
>> Yoshihiko SATO
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
>
>
>
> --
> Serge Dubrouski.
>
--
Serge Dubrouski.
["xen0.patch" (application/octet-stream)]
--- xen0 2009-03-04 10:08:39.000000000 -0700
+++ xen0.new 2009-04-14 16:12:36.000000000 -0600
@@ -16,6 +16,7 @@
STOP_COMMAND="xm destroy"
START_COMMAND="xm create"
DUMP_COMMAND="xm dump-core"
+CHECK_COMMAND="xm list | grep"
DEFAULT_XEN_DIR="/etc/xen"
SSH_COMMAND="/usr/bin/ssh -q -x -n"
@@ -82,20 +83,26 @@
case $2 in
stop)
- kill_node=`$SSH_COMMAND $dom0 "grep ^[[:space:]]*name $cfg" | cut \
-f 2 -d '=' | sed -e 's,",,g'`
- if [ "x" = "x$kill_node" ]
- then
- echo "Couldn't find a node name to stop"
- exit 1
- fi
-
- if [ "x$run_dump" != "x" ]
- then
- #Need to run core dump
- $SSH_COMMAND $dom0 "$DUMP_COMMAND $kill_node >/dev/null 2>&1"
- fi
-
- $SSH_COMMAND $dom0 "(sleep 2; $STOP_COMMAND $kill_node) >/dev/null \
2>&1 &" + kill_node=`$SSH_COMMAND $dom0 "grep ^[[:space:]]*name $cfg" \
| cut -f 2 -d '=' | sed -e 's,",,g'` + if [ "x" = "x$kill_node" ]
+ then
+ echo "Couldn't find a node name to stop"
+ exit 1
+ fi
+
+ if [ "x$run_dump" != "x" ]
+ then
+ #Need to run core dump
+ $SSH_COMMAND $dom0 "$DUMP_COMMAND $kill_node >/dev/null 2>&1"
+ fi
+
+ $SSH_COMMAND $dom0 "(sleep 2; $STOP_COMMAND $kill_node) >/dev/null \
2>&1 &" + if $SSH_COMMAND $dom0 "(sleep 2; ${CHECK_COMMAND} \
^$kill_node) >/dev/null 2>&1" + then
+ #Dom0 wasn't able to destroy DomU
+ echo "xm destroy failed. $kill_node is still active"
+ exit 1
+ fi
break;;
start)
$SSH_COMMAND $dom0 "(sleep 2; $START_COMMAND $cfg) >/dev/null 2>&1 \
&"
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic