[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] MDS stuck in replay
From:       Magnus HAGDORN <Magnus.Hagdorn () ed ! ac ! uk>
Date:       2022-05-31 7:41:42
Message-ID: de012869584d1a0b2b75dd8543f12d46f0462cdc.camel () ed ! ac ! uk
[Download RAW message or body]

Hi all,
it seems to be the time of stuck MDSs. We also have our ceph filesystem
degraded. The MDS is stuck in replay for about 20 hours now.

We run a nautilus ceph cluster with about 300TB of data and many
millions of files. We run two MDSs with a particularly large directory
pinned to one of them. Both MDSs have standby MDSs.

 We are in the process of migrating to a new pacific cluster and have
been syncing files daily. Over the weekend something happened and we
ended up with slow MDS responses and some directories became very slow
(as we'd expect). We restarted the second MDS. It came back within a
minute and the problem disappeared for a little while. The slow MDS
operations came back and we restarted the other MDS. This one has been
in replay state since yesterday.

The cluster is healthy.

So, we are wondering what it is up to. How long it might take. And is
there something we can do to speed up the replay phase.

Regards
magnus
The University of Edinburgh is a charitable body, registered in Scotland, with \
registration number SC005336. Is e buidheann carthannais a th' ann an Oilthigh Dhùn \
Èideann, clà raichte an Alba, à ireamh clà raidh SC005336. \
_______________________________________________ ceph-users mailing list -- \
ceph-users@ceph.io To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic