[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [lustre-discuss] Appropriate Umount Ordering
From:       Ellis Wilson via lustre-discuss <lustre-discuss () lists ! lustre ! org>
Date:       2022-02-17 16:07:56
Message-ID: MN2PR21MB143988BA5FAB6DF96907CBA5BE369 () MN2PR21MB1439 ! namprd21 ! prod ! outlook ! com
[Download RAW message or body]

Hi all,

(Hopefully) simple two questions this time around.  This is for 2.14.0, and my \
cluster is setup with no failovers for MDTs or OSTs.  OBD timeouts have not been \
altered from the defaults.

Question 1:
	
I read on the Lustre Wiki that the appropriate ordering to umount the various \
components of a Lustre filesystem is: 1. Clients
2. MDT(s)
3. OSTs
4. MGS

However, if I do it this way, the OST mounts always hang for 04:25 seconds before \
umounting.  Dmesg reports: [88944.272233] Lustre: \
30178:0:(client.c:2282:ptlrpc_expire_one_request()) @@@ Request sent has timed out \
for slow reply: [sent 1645111309/real 1645111309]  req@00000000cc9c1aeb \
x1724931853622016/t0(0) o39->lustrefs-MDT0000-lwp-OST0000@10.1.98.8@tcp:12/10 lens \
224/224 e 0 to 1 dl 1645111574 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'' \
[88944.275884] Lustre: Failing over lustrefs-OST0000 [88944.429622] Lustre: server \
umount lustrefs-OST0000 complete

For reference, if I reverse OSTs and MDT (do the MDT second), then all of the OST \
umounts are fast, but the MDT takes a whopping 8 minutes and 50 seconds to umount.

Why is the canonical shutdown ordering delaying so long (and so specifically) for me?

Question 2:

In all cases (OSTs or MDTs) of umount, whether they are fast or not, I see messages \
like the following in dmesg: [88944.275884] Lustre: Failing over lustrefs-OST0000
or
[78406.007678] Lustre: Failing over lustrefs-MDT0000

There is no failover configured in my setup.  The MGS is up the entire time in all \
cases.  What is lustre doing here?  How do I explicitly disable this failover \
attempt, since it seems to be at best misleading and at worst directly related to the \
lengthy delays?  FWIW, I have tried umount with '-f' to cause the MDT to go into \
failout rather than failover to no avail.

Thanks for any help folks can offer on this in advance,

ellis
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic