[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    [Linux-ha-dev] New STONITH plug-in to be usable by Xen and KVM
From:       Yasumasa OZAKI <ozakiyas () intellilink ! co ! jp>
Date:       2010-09-30 10:38:43
Message-ID: 4CA468B3.2030004 () intellilink ! co ! jp
[Download RAW message or body]

Hi Dejan,

Thank you for reply.

I correct it based on your opinion because I think that your opinion is 
correct. wait a little, please.

- The stop of the domain is judged from the virsh returns, and 
CheckIfDead function is removed.

- RunCommand function does, and shortens the refactoring.

- Other points are corrected based on your opinion.

Thanks,
Yasumasa OZAKI

On Thu, 23 Sep 2010 17:08:12 +0200, Dejan Muhamedagic wrote:
> Hi Yasumasa-san,
> 
> On Fri, Sep 10, 2010 at 03:17:56PM +0900, Yasumasa OZAKI wrote:
> > Hi,
> > 
> > I made the STONITH plug-in that can be used by both Xen and KVM.
> 
> Thanks! Comments below.
> 
> > I would like to hear any opinion.
> > 
> > Best regards,
> > Yasumasa OZAKI
> > 
> > 
> > 
> 
> > #!/bin/sh
> > #
> > # External STONITH module for Xen/KVM hypervisor through ssh.
> > # Uses Xen/KVM hypervisor as a STONITH device to control guest.
> > #
> > # Copyright (c) 2010 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> > #
> > 
> > STOP_COMMAND="virsh destroy"
> > START_COMMAND="virsh start"
> > DUMP_COMMAND="virsh dump"
> > SSH_COMMAND="/usr/bin/ssh -q -x -n"
> > 
> > # Rewrite the hostlist to accept "," as a delimeter for hostnames too.
> > hostlist=`echo $hostlist | tr ',' ' '`
> > 
> > CheckIfDead() {
> > for j in 1 2 3 4 5
> > do
> > if ! ping -w1 -c1 "$1" >/dev/null 2>&1
> > then
> > return 0
> > fi
> > sleep 1
> > done
> > 
> > return 1
> > }
> 
> I'd rather have this one removed. If virsh returns a meaningful
> exit code, then we can rely on that, right? Or use domstate?
> 
> > CheckHostList() {
> > if [ "x" = "x$hostlist" ]
> > then
> > ha_log.sh err "hostlist isn't set"
> > exit 1
> > fi
> > }
> > 
> > CheckHypervisor() {
> > if [ "x" = "x$hypervisor" ]
> > then
> > ha_log.sh err "hypervisor isn't set"
> > exit 1
> > fi
> > }
> > 
> > RunCommand() {
> > CheckHostList
> > CheckHypervisor
> > 
> > for node in $hostlist
> > do
> > if [ "$node" != "$1" ]
> > then
> > continue
> > fi
> 
> You can abbreviate code like this into a more compact
> 
> 	[ "$node" != "$1" ] && continue
> 
> > case $2 in
> > stop)
> > if [ "x$run_dump" != "x" ]
> > then
> > #Need to run core dump
> > if [ "x$dump_dir" != "x" ]
> > then
> > TIMESTAMP=`date +%Y-%m%d-%H%M.%S`
> > DOMAINNAME=`printf "%s" $node`
> > COREFILE=$dump_dir/$TIMESTAMP-$DOMAINNAME.core
> > #Run core dump
> > command_result="$($SSH_COMMAND $hypervisor " \
> > pgrep -f ^virsh_$node >/dev/null 2>&1; \
> > if [ \$? = 0 ]; then echo RUNNING; exit; fi; \
> > mkdir -p $dump_dir; \
> > if [ \$? != 0 ]; then echo MKDIR_FAILED; exit; fi; \
> > (exec -a virsh_$node $DUMP_COMMAND $node $COREFILE >/dev/null 2>&1); \
> > if [ \$? != 0 ]; then echo DUMP_FAILED; exit; fi; \
> > echo OK")"
> > ssh_result=$?
> > if [ $ssh_result = 0 ]
> > then
> > case "$command_result" in
> > RUNNING)
> > ha_log.sh info "Dump is already running"
> > exit 0
> > ;;
> > SUSPEND_FAILED)
> > ha_log.sh err "Failed to suspend domain $node"
> > ;;
> > MKDIR_FAILED)
> > ha_log.sh err "Failed to create directory $dump_dir"
> > ;;
> > DUMP_FAILED)
> > ha_log.sh err "Failed to core dump domain $node to $COREFILE"
> > ;;
> > OK)
> > ha_log.sh notice "Domain $node dumped to $COREFILE"
> > ;;
> > esac
> > else
> > ha_log.sh err "Couldn't connect to hypervisor $hypervisor"
> > fi
> > else
> > ha_log.sh err "dump_dir isn't set"
> > fi
> > fi
> > 
> > command_result=$($SSH_COMMAND $hypervisor "((sleep 2; $STOP_COMMAND $node) \
> > >/dev/null 2>&1 &); echo \$?") ssh_result=$?
> > if [ $ssh_result = 0 ]
> > then
> > if [ $command_result = 0 ]
> > then
> > ha_log.sh notice "Domain $node is stoped"
> > else
> > ha_log.sh err "Failed to stop domain $node"
> > fi
> > else
> > ha_log.sh err "Couldn't connect to hypervisor $hypervisor"
> > fi
> > break;;
> > start)
> > command_result=$($SSH_COMMAND $hypervisor "((sleep 2; $START_COMMAND $node) \
> > >/dev/null 2>&1 &); echo \$?") ssh_result=$?
> > if [ $ssh_result = 0 ]
> > then
> > if [ $command_result = 0 ]
> > then
> > ha_log.sh notice "Domain $node is started"
> > else
> > ha_log.sh err "Failed to start domain $node"
> > fi
> > else
> > ha_log.sh err "Couldn't connect to hypervisor $hypervisor"
> > fi
> > break;;
> > esac
> > exit 0
> > done
> > }
> 
> This function is too long. Can you try to refactor and split it
> into several. Also, the stop and start good candidates to be
> folded into one function.
> 
> > # Main code
> > 
> > case $1 in
> > gethosts)
> > CheckHostList
> > 
> > for node in $hostlist ; do
> > echo $node
> > done
> > exit 0
> > ;;
> > on)
> > RunCommand $2 start
> > exit $?
> > ;;
> > off)
> > if RunCommand $2 stop
> > then
> > if CheckIfDead $2
> > then 
> > exit 0
> > fi
> > fi
> > 
> > exit 1
> > ;;
> > reset)
> > RunCommand $2 stop
> > 
> > if CheckIfDead $2
> > then
> > RunCommand $2 start 
> > exit 0
> > fi
> > 
> > exit 1
> > ;;
> > status)
> > exit 0
> > ;;
> > getconfignames)
> > echo "hostlist hypervisor"
> > exit 0
> > ;;
> > getinfo-devid)
> > echo "virsh STONITH device"
> > exit 0
> > ;;
> > getinfo-devname)
> > echo "virsh STONITH external device"
> > exit 0
> > ;;
> > getinfo-devdescr)
> > echo "ssh-based Linux host reset for Xen/KVM guest domain trough hypervisor"
> > echo "Fine for testing, but not really suitable for production!"
> 
> Well, it could be used in production too, if you replace the
> ping part.
> 
> > exit 0
> > ;;
> > getinfo-devurl)
> > echo "http://openssh.org http://www.xensource.com/ http://linux-ha.org/wiki"
> 
> I think that this should be reduced to just xensource.
> 
> > exit 0
> > ;;
> > getinfo-xml)
> > cat << SSHXML
> > <parameters>
> > <parameter name="hostlist" unique="1" required="1">
> > <content type="string" />
> > <shortdesc lang="en">
> > Hostlist
> > </shortdesc>
> > <longdesc lang="en">
> > The list of controlled nodes.
> > For example: "node1 node2"
> > </longdesc>
> > </parameter>
> > <parameter name="hypervisor" unique="1" required="1">
> > <content type="string" />
> > <shortdesc lang="en">
> > Hypervisor hostname
> > </shortdesc>
> > <longdesc lang="en">
> > Host name to execute hypervisor. Root user shall be able to ssh to that node.
> 
> Using public key authentication. This may also be a security
> issue, but let's leave that to users to decide.
> 
> > </longdesc>
> > </parameter>
> > <parameter name="run_dump" unique="0" required="0">
> > <content type="string" />
> > <shortdesc lang="en">
> > Run core dump
> > </shortdesc>
> > <longdesc lang="en">
> > If set plugin will call "virsh dump" before killing guest domain
> > </longdesc>
> > </parameter>
> > <parameter name="dump_dir" unique="1" required="0">
> > <content type="string" />
> > <shortdesc lang="en">
> > Run dump core with the specified directory 
> > When the "run_dump" parameter is set, this parameter is indispensable 
> > </shortdesc>
> > <longdesc lang="en">
> > This parameter can indicate the dump destination.
> > Should be set as a full path format, ex.) "/var/log/dump"
> > The above example would dump the core, like;
> > /var/log/dump/2009-0316-1403.37-GuestDomain.core
> > </longdesc>
> > </parameter>
> > </parameters>
> > SSHXML
> > exit 0
> > ;;
> > *)
> > exit 1
> > ;;
> > esac
> 
> Cheers,
> 
> Dejan

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic