'RE: [Lustre-discuss] Failover'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-announce
Subject:    RE: [Lustre-discuss] Failover
From:       "Mc Carthy, Fergal" <Fergal.McCarthy () hp ! com>
Date:       2006-01-27 19:32:11
Message-ID: 575D0CDD99F591478ADD639EF4E36E7E04361621 () iloexc01 ! emea ! cpqcorp ! net
[Download RAW message or body]

For reference the --failover flag has no meaning to lconf unless used
when stopping devices. Please also note that you should only be attempt
to start any device in only one place at a time.

Lustre doesn't provide any sort of automatic failover mechanism itself;
it just provided hooks to allow external failover management systems,
e.g. Cluster Manager, Heartbeat, etc ..., to be able to manage
failover/failback.

You should plan that only one node runs a given service at a time, and
use some external mechanism to track which node that is, e.g. use LDAP
to hold the config and use the lactive tool to update the preferred
server setting. Then you start the service on that node. If you want to
manually failover a service, e.g. the mds you must run

# lconf <config opts> --group mds01 --cleanup --failover

on the node where it is currently active, then update the external
active service node tracking mechanism, and then run the normal startup
command on the alternate node to start the service up again.

When you start on the alternate node it should enter recovery mode for a
while to allow previous clients to reconnect. If there are no clients it
will stall there forever. If some, but not all, of the client reconnect
then after 2.5 * Lustre timeout (2.5 * 100 by default) seconds the
recovery period will finish and the Lustre service should become
available again.

Trying to start the Lustre service on more than one node at the same
time will very likely lead to significant, and potentially
unrecoverable, corrupting of the backing disk device.

You haven't said which version of Lustre you are working with; 1.4.5
should be better than previous versions as it has a number of fixes for
issues you get when trying to failover stop active devices. However be
aware that if you try failover stopping and immediately restarting on
the same node you may on rare occasions find that the background unload
of the previous instance of the service is not yet complete. Use the
lctl device_list command (or cat /proc/fs/lustre/devices) to check and
see if the previous instance is still around (it will be in the ST
state).

Finally if you are running an MDS and an OST from the same Lustre file
system on the same node at the same time and that node dies, then you
run a small risk of not being able to seamlessly recovery all pending
client operations.

Fergal.

--

Fergal.McCarthy@HP.com

(The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated, you should consider this message and attachments as "HP
CONFIDENTIAL".)

-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Gregory
Golin
Sent: 27 January 2006 17:22
To: lustre-discuss@lists.clusterfs.com
Subject: [Lustre-discuss] Failover

Hi all,

I am trying to failover between two nodes that run both mds and ost. Is
this possible at all?
I used the howto pdf and a thread from this list as reference. Here's my
conf.sh:

config="config.xml"
LMC="${LMC:-/usr/sbin/lmc}"
TMP=${TMP:-/tmp}
hosta='speedster'
hostb='cobalt'
MOUNT=${MOUNT:-/mnt/lustre}
MOUNT2=${MOUNT2:-${MOUNT}2}
NETTYPE=${NETTYPE:-tcp}
JSIZE=${JSIZE:-0}
STRIPE_BYTES=${STRIPE_BYTES:-1048576}
STRIPES_PER_OBJ=0 # 0 means stripe over all OSTs
# create nodes
echo ${LMC} -o $config --add node --node $hosta
${LMC} -o $config --add node --node $hosta || exit 10
${LMC} -m $config --add net --node $hosta --nid $hosta --nettype \
$NETTYPE || exit 10
${LMC} -m $config --add node --node $hostb || exit 10
${LMC} -m $config --add net --node $hostb --nid $hostb --nettype \
$NETTYPE || exit 11
${LMC} -m $config --add net --node client --nid '*' --nettype \ 
$NETTYPE || exit 12

# configure mds server
${LMC} -m $config --add mds --node $hosta --mds mds1 --fstype ldiskfs \
      --dev /dev/sdc1 --failover --group mds01 || exit 20
${LMC} -m $config --add mds --node $hostb --mds mds1 --fstype ldiskfs \
      --dev /dev/sdc1 --failover || exit 21
# configure ost
${LMC} -m $config --add lov --lov lov1 --mds mds1 --stripe_sz
$STRIPE_BYTES \
      --stripe_cnt $STRIPES_PER_OBJ --stripe_pattern 0 $LOVOPT || exit
20

#${LMC} -m $config --add lov --lov lov1 --mds mds1 --stripe_sz
$STRIPE_BYTES \
#      --stripe_cnt $STRIPES_PER_OBJ --stripe_pattern 0 $LOVOPT || exit
20

${LMC} -m $config --add ost --node $hosta --lov lov1 \
      --fstype ldiskfs --dev /dev/sdc2 --ost ost1 --failover --group
mds01|| exit 30
${LMC} -m $config --add ost --node $hostb --lov lov1 \
      --fstype ldiskfs --dev /dev/sdc2 --ost ost1 --failover|| exit 30

# create client config
${LMC} -m $config --add mtpt --node $hosta --path $MOUNT --mds mds1
--lov lov1 $CLIENTOPT || exit 40

Basically what happens is that when I issue an lconf --failover on
speedster, it mounts the volume and all is good. When I do that on
cobalt, it stops at the NETWORK: message.

Any help is appreciated.

Thanks,
Greg

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss

[prev in list] [next in list] [prev in thread] [next in thread]