'RE: [Lustre-discuss] Lustre OST Failover Problem'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-announce
Subject:    RE: [Lustre-discuss] Lustre OST Failover Problem
From:       "Mc Carthy, Fergal" <fergal.mccarthy () hp ! com>
Date:       2005-03-08 10:19:55
Message-ID: 575D0CDD99F591478ADD639EF4E36E7E017D2E2B () iloexc01 ! emea ! cpqcorp ! net
[Download RAW message or body]

I have a number of comments so please read the entire message. However
the primary problem you are likely suffering from is discussed in the
last section under the topic of Upcalls.

Shared Device required for Failover
===================================
For failover you need to use a shared device. In the config below you
have used /tmp/ost1 as your ost1 device but at first glance that would
suggest that it is node local unless you /tmp is network shared?

I would recommend that you locate your loopback devices - that you want
to use for failover testing - on a network shared file system such as an
NFS mount.

Reformat Option
=============
You also only need to run the format command for the ost1 loopback
device once, so long as you have located it on a shared file system. And
the client lconf mount doesn't the format flag at all. And you should
only use the format flag the first time you start the fail system

Failover Testing Methodology
============================
You should also be aware of the fact that Lustre failover does not run
in a hot standby mode, i.e. you can't have two servers simultaneously
providing the same device instance; this would be a recipe for instance
device corruption. Instead you need to have a detection method to
determine that one server has died, and upon detecting that event, then
start the failed device up on the other node. And when the original
server is restored to operation then you would need to stop the failed
over device on the backup server and start it again on the original
server.

Loopback Devices and Power Off Testing
======================================
If you are using loopback devices then use the power off method may not
be a safe testing method since the latest data on the loopback device
may not actually get flushed to disk, leaving the loopback device in a
corrupted state as far as Lustre is concerned.

Doing a failover stop of the device, using lconf --cleanup --failover,
is a better approach but it also has its problems though these are
currently being fixed in later versions of Lustre.

Lustre Timeout
==============
Lustre uses a 100 second default communications timeout. This means that
it is only after 100 seconds of no response from a server that a Lustre
client decides that there is a problem with the communications link with
a server and will attempt to do something about it. It does this using
upcalls if they are configured.

Upcalls
=======
If a comms error is detected then Lustre can invoke an external script,
known as the upcall script, to handle the connection recovery. You don't
say how the lconf --recover operation is being invoked in you testing,
but it is important to understand that it should only be invoked in
response to a FAILED_IMPORT upcall.

Since I don't see you specifying an upcall on the mtpt definition line
then the default action (which can be explicitly specified by the upcall
setting of DEFAULT) is for Lustre to attempt to reconnect to the failed
device using the information that it already has, and not generate an
upcall script FAILED_IMPORT call.

If you attempt to invoke lconf --recover manually you need to wait until
such an upcall event has occurred otherwise it will fail because Lustre.
You should also note that the argument information being specified to
the lconf --recover should come from the arguments passed to the upcall
script invocation. These arguments allow lconf --recover to determine
which connection needs to be restored, and it then does this using the
information specified in the Lustre config (and the --select flag in
your test case).

Fergal.

--

Fergal.McCarthy@HP.com

(The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated, you should consider this message and attachments as "HP
CONFIDENTIAL".)

-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of raymondyu
Sent: 08 March 2005 02:23
To: lustre-discuss@lists.clusterfs.com
Subject: [Lustre-discuss] Lustre OST Failover Problem

Hi,I am trying to use failover of MDS.
My lustre version is 1.2.4. This is what i do for this target. I have
four
nodes:osta, ostb, client,mds.

First, i creat a config file using lmc:
lmc -o group.xml --add net --node mds --nid 192.168.1.179 --nettype tcp
lmc -m group.xml --add net --node osta --nid 192.168.1.176 --nettype tcp
lmc -m group.xml --add net --node ostb --nid 192.168.1.177 --nettype tcp
lmc -m group.xml --add net --node client --nid 192.168.1.175 --nettype
tcp

lmc -m group.xml --add mds --node Lustre --mds mds1 --fstype ext3 --dev
/tmp/mds1 --size 500000

lmc -m group.xml --add lov --lov lov1 --mds mds1 --stripe_sz
5536  --stripe_cnt 0 --stripe_pattern 0
lmc -m group.xml --add ost --node osta --lov lov1  --ost
ost1 --failover --fstype ext3 --dev /tmp/ost1 --size 500000
lmc -m group.xml --add ost --node ostb --lov lov1 --ost
ost1 --failover --fstype ext3 --dev /tmp/ost1 --size 500000

lmc -m group.xml --add mtpt --node client --path /var/share --mds mds1
--lov
lov1

Then, i config every node:

lconf --reformat --node osta group.xml
lconf --reformat --node ostb group.xml
lconf --reformat --node mds group.xml
lconf --reformat --node client group.xml

Next, i close osta using poweroff.
In ostb, i start ost1:

lconf --node node1 --select ost1=otsb group.xml

And in client i try to recover ost connection:

lconf --recover --select ost1=ostb --tgt_uuid ost1_UUID --conn_uuid
NID_192.168.1.176_UUID --client_uuid NID_192.168.1.175_UUID  group.xml

Unfortunately, ost is not recovered.

Is there anything i wrong?

Is the correct setup to test failover?

Thanks you

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss

[prev in list] [next in list] [prev in thread] [next in thread]