[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: monitor sst files continue growing
From:       Wido den Hollander <wido () 42on ! com>
Date:       2020-10-30 8:39:31
Message-ID: 3b08c6b1-e993-e127-260d-dfc2ba153317 () 42on ! com
[Download RAW message or body]



On 29/10/2020 19:29, Zhenshi Zhou wrote:
> Hi Alex,
> 
> We found that there were a huge number of keys in the "logm" and "osdmap"
> table
> while using ceph-monstore-tool. I think that could be the root cause.
> 

But that is exactly how Ceph works. It might need that very old OSDMap 
to get all the PGs clean again. An OSD which has been gone for a very 
long time and needs to catch up to make a PG clean.

If not all PGs are active+clean you will and can see the MON databases 
grow rapidly.

Therefor I always deploy 1TB SSDs in all Monitors. Not expensive anymore 
and they give breathing room.

I always deploy physical and dedicated machines for Monitors just to 
prevent these cases.

Wido

> Well, some pages also say that disable 'insight' module can resolve this
> issue, but
> I checked our cluster and we didn't enable this module. check this page
> <https://tracker.ceph.com/issues/39955>.
> 
> Anyway, our cluster is unhealthy though, it just need time keep recovering
> data :)
> 
> Thanks
> 
> Alex Gracie <alexandergracie17@gmail.com> 于2020年10月29日周四 下午10:57写道:
> 
>> We hit this issue over the weekend on our HDD backed EC Nautilus cluster
>> while removing a single OSD. We also did not have any luck using
>> compaction. The mon-logs filled up our entire root disk on the mon servers
>> and we were running on a single monitor for hours while we tried to finish
>> recovery and reclaim space. The past couple weeks we also noticed "pg not
>> scubbed in time" errors but are unsure if they are related. I'm still the
>> exact cause of this(other than the general misplaced/degraded objects) and
>> what kind of growth is acceptable for these store.db files.
>>
>> In order to get our downed mons restarted, we ended up backing up and
>> coping the /var/lib/ceph/mon/* contents to a remote host, setting up an
>> sshfs mount to that new host with large NVME and SSDs, ensuring the mount
>> paths were owned by ceph, then clearing up enough space on the monitor host
>> to start the service. This allowed our store.db directory to grow freely
>> until the misplaced/degraded objects could recover and monitors all
>> rejoined eventually.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-leave@ceph.io
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-leave@ceph.io
> 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic