[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: Difficulty with fixing an inconsistent PG/object
From:       Konstantin Shalygin <k0ste () k0ste ! ru>
Date:       2022-06-29 10:34:42
Message-ID: 44AA46FC-D307-4BF1-A44A-37D0CAAD6217 () k0ste ! ru
[Download RAW message or body]

Hi!

Just try to Google data_digest_mismatch_oi
On old maillist archives couple of threads with same problem


k
Sent from my iPhone

> On 29 Jun 2022, at 13:54, Lennart van Gijtenbeek | Routz \
> <lennart.vangijtenbeek@routz.nl> wrote: 
> Hello Ceph community,
> 
> 
> I hope you could help me with an issue we are experiencing on our backup cluster.
> 
> The Ceph version we are running here is 10.2.10 (Jewel), and we are using \
> Filestore. The PG is part of a replicated pool with size=2.
> 
> 
> Getting the following error:
> ```
> 
> root@cephmon0:~# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
> pg 37.189 is active+clean+inconsistent, acting [144,170]
> 2 scrub errors
> ```
> 
> ```
> root@cephmon0:~# grep 37.189 /var/log/ceph/ceph.log
> 2022-06-29 11:11:27.782920 osd.144 10.129.160.22:6800/2810 7598 : cluster [INF] \
> osd.144 pg 37.189 Deep scrub errors, upgrading scrub to deep-scrub 2022-06-29 \
> 11:11:27.884628 osd.144 10.129.160.22:6800/2810 7599 : cluster [INF] 37.189 \
> deep-scrub starts 2022-06-29 11:13:07.124841 osd.144 10.129.160.22:6800/2810 7600 : \
> cluster [ERR] 37.189 shard 144: soid \
> 37:9193d307:::isqPpJMKYY4.000000000000001e:head data_digest 0x50007bd9 != \
> data_digest 0x885fabcc from auth oi \
> 37:9193d307:::isqPpJMKYY4.000000000000001e:head(7211'173457 osd.71.0:397191 \
> dirty|data_digest|omap_digest s 4194304 uv 39699 dd 885fabcc od ffffffff alloc_hint \
> [0 0]) 2022-06-29 11:13:07.124849 osd.144 10.129.160.22:6800/2810 7601 : cluster \
> [ERR] 37.189 shard 170: soid 37:9193d307:::isqPpJMKYY4.000000000000001e:head \
> data_digest 0x50007bd9 != data_digest 0x885fabcc from auth oi \
> 37:9193d307:::isqPpJMKYY4.000000000000001e:head(7211'173457 osd.71.0:397191 \
> dirty|data_digest|omap_digest s 4194304 uv 39699 dd 885fabcc od ffffffff alloc_hint \
> [0 0]) 2022-06-29 11:13:07.124853 osd.144 10.129.160.22:6800/2810 7602 : cluster \
> [ERR] 37.189 soid 37:9193d307:::isqPpJMKYY4.000000000000001e:head: failed to pick \
> suitable auth object 2022-06-29 11:20:46.459906 osd.144 10.129.160.22:6800/2810 \
> 7603 : cluster [ERR] 37.189 deep-scrub 2 errors ```
> 
> The PG has already been transferred from 2 other OSDs. That is, the same error \
> occurred when the PG was stored on two different OSDs. So it seems this is not a \
> disk issue. There seems to be something wrong with the object \
> "isqPpJMKYY4.000000000000001e". However, when looking at the md5sum for the object. \
> On both OSDs, this is the same. 
> 
> ```
> 
> root@ceph12:/var/lib/ceph/osd/ceph-144/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# \
> ls -l isqPpJMKYY4.000000000000001e__head_E0CBC989__25 
> -rw-r--r-- 1 ceph ceph 4194304 Jun  3 09:56 \
> isqPpJMKYY4.000000000000001e__head_E0CBC989__25 
> root@ceph12:/var/lib/ceph/osd/ceph-144/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# \
> md5sum isqPpJMKYY4.000000000000001e__head_E0CBC989__25 \
> 96d702072cd441f2d0af60783e8db248  isqPpJMKYY4.000000000000001e__head_E0CBC989__25 \
> ``` 
> ```
> root@ceph15:/var/lib/ceph/osd/ceph-170/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# \
>                 ls -l isqPpJMKYY4.000000000000001e__head_E0CBC989__25
> -rw-r--r-- 1 ceph ceph 4194304 Jun 23 16:41 \
> isqPpJMKYY4.000000000000001e__head_E0CBC989__25 
> root@ceph15:/var/lib/ceph/osd/ceph-170/current/37.189_head/DIR_9/DIR_8/DIR_9/DIR_C# \
> md5sum isqPpJMKYY4.000000000000001e__head_E0CBC989__25 \
> 96d702072cd441f2d0af60783e8db248  isqPpJMKYY4.000000000000001e__head_E0CBC989__25 \
> ``` 
> ```
> root@cephmon0:~# rados list-inconsistent-obj 37.189 --format=json-pretty
> {
> "epoch": 167653,
> "inconsistents": [
> {
> "object": {
> "name": "isqPpJMKYY4.000000000000001e",
> "nspace": "",
> "locator": "",
> "snap": "head",
> "version": 39699
> },
> "errors": [],
> "union_shard_errors": [
> "data_digest_mismatch_oi"
> ],
> "selected_object_info": \
> "37:9193d307:::isqPpJMKYY4.000000000000001e:head(7211'173457 osd.71.0:397191 \
> dirty|data_digest|omap_digest s 4194304 uv 39699 dd 885fabcc od ffffffff alloc_hint \
> [0 0])", "shards": [
> {
> "osd": 144,
> "errors": [
> "data_digest_mismatch_oi"
> ],
> "size": 4194304,
> "omap_digest": "0xffffffff",
> "data_digest": "0x50007bd9"
> },
> {
> "osd": 170,
> "errors": [
> "data_digest_mismatch_oi"
> ],
> "size": 4194304,
> "omap_digest": "0xffffffff",
> "data_digest": "0x50007bd9"
> }
> ]
> }
> ]
> }
> ```
> 
> I don't understand where there is a "data_digest_mismatch_oi" error. Since the \
> checksums seem to match. 
> Does anyone have any idea on how to fix this?
> Your input would be very much appreciated. Please let me know if you need \
> additional info. 
> Thank you.
> 
> Best regards,
> Lennart van Gijtenbeek
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-leave@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic