[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?
From:       Frank Schilder <frans () dtu ! dk>
Date:       2021-08-31 14:27:10
Message-ID: 0f5dee5fb0844ab887fd8383c9709248 () dtu ! dk
[Download RAW message or body]

Hi Dan,

unfortunately, the file/directory names were generated like one would do for \
temporary files. No clue about their location. I would need to find such a file while \
it exists. Of course, I could execute a find on the snapshot ...

Just kidding. The large omap count is going down already, the first 4 are probably \
purged from the snapshots.

Thanks and best regards,
=================
Frank Schilder
AIT Risų Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan@vanderster.com>
Sent: 31 August 2021 15:44:41
To: Frank Schilder
Cc: Patrick Donnelly; ceph-users
Subject: Re: [ceph-users] LARGE_OMAP_OBJECTS: any proper action possible?

Hi,

I don't know how to find a full path from a dir object.
But perhaps you can make an educated guess based on what you see in:

rados listomapkeys --pool=con-fs2-meta1 1000eec35f5.01000000 | head -n 100

Those should be the directory entries. (s/_head//)

-- Dan

On Tue, Aug 31, 2021 at 2:31 PM Frank Schilder <frans@dtu.dk> wrote:
> 
> Dear Dan and Patrick,
> 
> the find didn't return anything. With this and the info below, am I right to assume \
> that these were temporary working directories that got caught in a snapshot (we use \
> rolling snapshots)? 
> I would really appreciate any ideas on how to find out the original file system \
> path of these large directories. I would like to advise the user(s) that we have a \
> special high-performance file system for temporary data. 
> I can't find indications of performance problems with the meta-data pool. After the \
> re-deployment of OSDs with quadrupling the OSD count, the meta data pool seems to \
> perform very well. The find did run over a 1.3PB file system in under 18hours. 
> However, running this find on the root got me caught in another problem: \
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/HKEBXXRMX5WA5Y6JFM34WFPMWTCMPFCG/#EMHNSHZIPFZZ5QYS6B4VW3LUGL6HDTOP
>  
> Apparently, the meta data performance is now so high that a single client can crash \
> an MDS daemon and even take the MDS cluster with it. 
> Best regards,
> =================
> Frank Schilder
> AIT Risų Campus
> Bygning 109, rum S14
> 
> ________________________________________
> From: Frank Schilder
> Sent: 30 August 2021 16:18:02
> To: ceph-users
> Cc: Dan van der Ster; Patrick Donnelly
> Subject: Re: [ceph-users] LARGE_OMAP_OBJECTS: any proper action possible?
> 
> Dear Dan and Patrick,
> 
> I have the suspicion that I'm looking at large directories in the snapshots that do \
> no longer exist any more on the file system. Hence, the omap objects are not \
> fragmented as explained in the tracker issue. Here is the info as you asked me to \
> pull out: 
> > find /cephfs -type d -inum 1099738108263
> 
> The find didn't return yet. Would be great to find which user is doing that. \
> Unfortunately, I don't believe the directory still exists. 
> > rados -p cephfs_metadata listomapkeys 1000d7fd167.02800000
> 
> I did this on a different object:
> 
> # rados listomapkeys --pool=con-fs2-meta1 1000eec35f5.01000000 | wc -l
> 216000
> 
> This matches with the log message. I guess these keys are file/dir names? Then yes, \
> its a huge directory. 
> > Please try the resolutions suggested in: https://tracker.ceph.com/issues/45333
> 
> If I understand correctly, the INODE.00000000 objects contain the path information:
> 
> [root@gnosis ~]# rados listxattr --pool=con-fs2-meta1 1000eec35f5.01000000
> [root@gnosis ~]# rados listxattr --pool=con-fs2-meta1 1000eec35f5.00000000
> layout
> parent
> 
> Decoding the meta info in the parent attribute gives:
> 
> [root@gnosis ~]# rados getxattr --pool=con-fs2-meta1 1000eec35f5.00000000 parent | \
> ceph-dencoder type inode_backtrace_t import - decode dump_json {
> "ino": 1099761989109,
> "ancestors": [
> {
> "dirino": 1552,
> "dname": "1000eec35f5",
> "version": 882614706
> },
> {
> "dirino": 257,
> "dname": "stray6",
> "version": 563853824
> }
> ],
> "pool": 12,
> "old_pools": []
> }
> 
> This smells a lot like a deleted directory in a snapshot, moved to one of the stray \
> object bucket. The result is essentially the same for all large omap objects except \
> for the stray number. Is it possible to figure out the original location in the \
> file system path? 
> I guess I have to increase the warning threshold or live with the warning message, \
> neither of which is preferred. It would be great if you could help me find the \
> original path so I can identify the user and advice him/her on how to organise \
> his/her files. 
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risų Campus
> Bygning 109, rum S14
> 
> ________________________________________
> From: Patrick Donnelly <pdonnell@redhat.com>
> Sent: 27 August 2021 19:16:16
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] LARGE_OMAP_OBJECTS: any proper action possible?
> 
> Hi Frank,
> 
> On Wed, Aug 25, 2021 at 6:27 AM Frank Schilder <frans@dtu.dk> wrote:
> > 
> > Hi all,
> > 
> > I have the notorious "LARGE_OMAP_OBJECTS: 4 large omap objects" warning and am \
> > again wondering if there is any proper action one can take except "wait it out \
> > and deep-scrub (numerous ceph-users threads)" or "ignore \
> > (https://docs.ceph.com/en/latest/rados/operations/health-checks/#large-omap-objects)". \
> > Only for RGWs is a proper action described, but mine come from MDSes. Is there \
> > any way to ask an MDS to clean up or split the objects? 
> > The disks with the meta-data pool can easily deal with objects of this size. My \
> > question is more along the lines: If I can't do anything anyway, why the warning? \
> > If there is a warning, I would assume that one can do something proper to prevent \
> > large omap objects from being born by an MDS. What is it?
> 
> Please try the resolutions suggested in: https://tracker.ceph.com/issues/45333
> 
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic