'[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?
From:       Igor Fedotov <ifedotov () suse ! de>
Date:       2020-11-26 19:21:11
Message-ID: 64185604-5fd3-8b7e-7187-a872cbcab458 () suse ! de
[Download RAW message or body]

OK, cool!

Will try to reproduce this locally tomorrow...


Thanks,

Igor

On 11/26/2020 10:19 PM, Dan van der Ster wrote:
> Those osds are intentionally out, yes. (They were drained to be replaced).
>
> I have fixed 2 clusters' stats already with this method ... both had
> up but out osds, and stopping the up/out osd fixed the stats.
>
> I opened a tracker for this: https://tracker.ceph.com/issues/48385
>
> -- dan
>
> On Thu, Nov 26, 2020 at 8:14 PM Igor Fedotov <ifedotov@suse.de> wrote:
>> Also wondering if you have the same "gap" OSDs at different cluster(s)
>> which show stats improperly?
>>
>>
>> On 11/26/2020 10:08 PM, Dan van der Ster wrote:
>>> Hey that's it!
>>>
>>> I stopped the up but out OSDs (100 and 177), and now the stats are correct!
>>>
>>> # ceph df
>>> RAW STORAGE:
>>>       CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>       hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>       TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>
>>> POOLS:
>>>       POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
>>>       public     68     2.9 PiB     143.56M     4.3 PiB     84.55       538 TiB
>>>       test       71      29 MiB       6.56k     1.2 GiB         0       269 TiB
>>>       foo        72     1.2 GiB         308     3.6 GiB         0       269 TiB
>>>
>>>
>>>
>>> On Thu, Nov 26, 2020 at 8:02 PM Dan van der Ster <dan@vanderster.com> wrote:
>>>> There are a couple gaps, yes: https://termbin.com/9mx1
>>>>
>>>> What should I do?
>>>>
>>>> -- dan
>>>>
>>>> On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov <ifedotov@suse.de> wrote:
>>>>> Does "ceph osd df tree" show stats properly (I mean there are no evident
>>>>> gaps like unexpected zero values) for all the daemons?
>>>>>
>>>>>
>>>>>> 1. Anyway, I found something weird...
>>>>>>
>>>>>> I created a new 1-PG pool "foo" on a different cluster and wrote some
>>>>>> data to it.
>>>>>>
>>>>>> The stored and used are equal.
>>>>>>
>>>>>> Thu 26 Nov 19:26:58 CET 2020
>>>>>> RAW STORAGE:
>>>>>>        CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>>>>        hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
>>>>>>        TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
>>>>>>
>>>>>> POOLS:
>>>>>>        POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
>>>>>>        public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538 TiB
>>>>>>        test       71      29 MiB       6.56k      29 MiB         0       269 TiB
>>>>>>        foo        72     1.2 GiB         308     1.2 GiB         0       269 TiB
>>>>>>
>>>>>> But I tried restarting the relevant three OSDs, and the bytes_used are
>>>>>> temporarily reported correctly:
>>>>>>
>>>>>> Thu 26 Nov 19:27:00 CET 2020
>>>>>> RAW STORAGE:
>>>>>>        CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>>>>        hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>>>>        TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
>>>>>>
>>>>>> POOLS:
>>>>>>        POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
>>>>>>        public     68     2.9 PiB     143.54M     4.3 PiB     84.55       538 TiB
>>>>>>        test       71      29 MiB       6.56k     1.2 GiB         0       269 TiB
>>>>>>        foo        72     1.2 GiB         308     3.6 GiB         0       269 TiB
>>>>>>
>>>>>> But then a few seconds later it's back to used == stored:
>>>>>>
>>>>>> Thu 26 Nov 19:27:03 CET 2020
>>>>>> RAW STORAGE:
>>>>>>        CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
>>>>>>        hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
>>>>>>        TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
>>>>>>
>>>>>> POOLS:
>>>>>>        POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
>>>>>>        public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538 TiB
>>>>>>        test       71      29 MiB       6.56k      29 MiB         0       269 TiB
>>>>>>        foo        72     1.2 GiB         308     1.2 GiB         0       269 TiB
>>>>>>
>>>>>> It seems to report the correct stats only when the PG is peering (so
>>>>>> some other transition state).
>>>>>> I've restarted all three relevant OSDs now -- the stats are reported
>>>>>> as stored == used.
>>>>>>
>>>>>> 2. Another data point -- I found another old cluster that reports
>>>>>> stored/used correctly. I have no idea what might be different about
>>>>>> that cluster -- we updated it just like the others.
>>>>>>
>>>>>> Cheers, Dan
>>>>>>
>>>>>> On Thu, Nov 26, 2020 at 6:22 PM Igor Fedotov <ifedotov@suse.de> wrote:
>>>>>>> For specific BlueStore instance you can learn relevant statfs output by
>>>>>>>
>>>>>>> setting debug_bluestore to 20 and leaving OSD for 5-10 seconds (or may
>>>>>>> be a couple of minutes - don't remember exact statsfs poll period ).
>>>>>>>
>>>>>>> Then grep osd log for "statfs" and/or "pool_statfs" and get the output
>>>>>>> formatted as per the following operator (taken from src/osd/osd_types.cc):
>>>>>>>
>>>>>>> ostream& operator<<(ostream& out, const store_statfs_t &s)
>>>>>>> {
>>>>>>>       out << std::hex
>>>>>>>           << "store_statfs(0x" << s.available
>>>>>>>           << "/0x"  << s.internally_reserved
>>>>>>>           << "/0x"  << s.total
>>>>>>>           << ", data 0x" << s.data_stored
>>>>>>>           << "/0x"  << s.allocated
>>>>>>>           << ", compress 0x" << s.data_compressed
>>>>>>>           << "/0x"  << s.data_compressed_allocated
>>>>>>>           << "/0x"  << s.data_compressed_original
>>>>>>>           << ", omap 0x" << s.omap_allocated
>>>>>>>           << ", meta 0x" << s.internal_metadata
>>>>>>>           << std::dec
>>>>>>>           << ")";
>>>>>>>       return out;
>>>>>>> }
>>>>>>>
>>>>>>> But honestly I doubt this is BlueStore which reports incorrectly since
>>>>>>> it doesn't care about replication.
>>>>>>>
>>>>>>> It rather looks like lack of stats from some replicas or improper pg
>>>>>>> replica factor processing...
>>>>>>>
>>>>>>> Perhaps legacy vs. new pool what matters... Can you try to create a new
>>>>>>> pool at old cluster and fill it with some data (e.g. just a single 64K
>>>>>>> object) and check the stats?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Igor
>>>>>>>
>>>>>>> On 11/26/2020 8:00 PM, Dan van der Ster wrote:
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>> No BLUESTORE_LEGACY_STATFS warning, and
>>>>>>>> bluestore_warn_on_legacy_statfs is the default true on this (and all)
>>>>>>>> clusters.
>>>>>>>> I'm quite sure we did the statfs conversion during one of the recent
>>>>>>>> upgrades (I forget which one exactly).
>>>>>>>>
>>>>>>>> # ceph tell osd.* config get bluestore_warn_on_legacy_statfs | grep -v true
>>>>>>>> #
>>>>>>>>
>>>>>>>> Is there a command to see the statfs reported by an individual OSD ?
>>>>>>>> We have a mix of ~year old and recently recreated OSDs, so I could try
>>>>>>>> to see if they differ.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 26, 2020 at 5:50 PM Igor Fedotov <ifedotov@suse.de> wrote:
>>>>>>>>> Hi Dan
>>>>>>>>>
>>>>>>>>> don't you have BLUESTORE_LEGACY_STATFS alert raised (might be silenced
>>>>>>>>> by bluestore_warn_on_legacy_statfs param) for the older cluster?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Igor
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/26/2020 7:29 PM, Dan van der Ster wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Depending on which cluster I look at (all running v14.2.11), the
>>>>>>>>>> bytes_used is reporting raw space or stored bytes variably.
>>>>>>>>>>
>>>>>>>>>> Here's a 7 year old cluster:
>>>>>>>>>>
>>>>>>>>>> # ceph df -f json | jq .pools[0]
>>>>>>>>>> {
>>>>>>>>>>        "name": "volumes",
>>>>>>>>>>        "id": 4,
>>>>>>>>>>        "stats": {
>>>>>>>>>>          "stored": 1229308190855881,
>>>>>>>>>>          "objects": 294401604,
>>>>>>>>>>          "kb_used": 1200496280133,
>>>>>>>>>>          "bytes_used": 1229308190855881,
>>>>>>>>>>          "percent_used": 0.4401889145374298,
>>>>>>>>>>          "max_avail": 521125025021952
>>>>>>>>>>        }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Note that stored == bytes_used for that pool. (this is a 3x replica pool).
>>>>>>>>>>
>>>>>>>>>> But here's a newer cluster (installed recently with nautilus)
>>>>>>>>>>
>>>>>>>>>> # ceph df -f json  | jq .pools[0]
>>>>>>>>>> {
>>>>>>>>>>        "name": "volumes",
>>>>>>>>>>        "id": 1,
>>>>>>>>>>        "stats": {
>>>>>>>>>>          "stored": 680977600893041,
>>>>>>>>>>          "objects": 163155803,
>>>>>>>>>>          "kb_used": 1995736271829,
>>>>>>>>>>          "bytes_used": 2043633942351985,
>>>>>>>>>>          "percent_used": 0.23379847407341003,
>>>>>>>>>>          "max_avail": 2232457428467712
>>>>>>>>>>        }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> In the second cluster, bytes_used is 3x stored.
>>>>>>>>>>
>>>>>>>>>> Does anyone know why these are not reported consistently?
>>>>>>>>>> Noticing this just now, I'll update our monitoring to plot stored
>>>>>>>>>> rather than bytes_used from now on.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic