[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: [nautilus] ceph tell hanging
From:       Nico Schottelius <nico.schottelius () ungleich ! ch>
Date:       2020-09-22 22:32:13
Message-ID: 87r1qto32q.fsf () ungleich ! ch
[Download RAW message or body]


Follow up on the tell hanging: iterating over all osds and trying to
raise the max-backfills gives hanging ceph tell processes like this:

root     1007846 15.3  1.2 918388 50972 pts/5    Sl   00:03   0:48 /usr/bin/python3 \
/usr/bin/ceph tell osd.4 injectargs --osd-max-backfill root     1007890  0.4  0.9 \
850664 37596 pts/5    Sl   00:03   0:01 /usr/bin/python3 /usr/bin/ceph tell osd.7 \
injectargs --osd-max-backfill root     1007930  0.3  0.9 842472 37484 pts/5    Sl   \
00:03   0:01 /usr/bin/python3 /usr/bin/ceph tell osd.11 injectargs --osd-max-backfil \
root     1007987  0.3  0.9 850668 37540 pts/5    Sl   00:03   0:01 /usr/bin/python3 \
/usr/bin/ceph tell osd.18 injectargs --osd-max-backfil root     1008054  0.4  0.9 \
850664 37600 pts/5    Sl   00:03   0:01 /usr/bin/python3 /usr/bin/ceph tell osd.29 \
injectargs --osd-max-backfil root     1008147 14.7  1.2 910192 50648 pts/5    Sl   \
00:03   0:42 /usr/bin/python3 /usr/bin/ceph tell osd.33 injectargs --osd-max-backfil \
root     1008205  0.3  0.9 842468 37524 pts/5    Sl   00:03   0:01 /usr/bin/python3 \
/usr/bin/ceph tell osd.45 injectargs --osd-max-backfil root     1008246  0.3  0.9 \
850664 37828 pts/5    Sl   00:04   0:01 /usr/bin/python3 /usr/bin/ceph tell osd.48 \
                injectargs --osd-max-backfil
...

Additionally many of the tell processes get into an infinite loop and
print this error over and over again:

2020-09-23 00:09:48.766 7f07e5f99700  0 --1- \
[2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> \
v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c8055680 0x7f07c8053740 \
:-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got \
BADAUTHORIZER 2020-09-23 00:09:48.774 7f07e5f99700  0 --1- \
[2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> \
v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c804f590 0x7f07c80505c0 \
:-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got \
BADAUTHORIZER 2020-09-23 00:09:48.786 7f07e5f99700  0 --1- \
[2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> \
v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c8055680 0x7f07c8053740 \
:-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got \
BADAUTHORIZER 2020-09-23 00:09:48.790 7f07e5f99700  0 --1- \
[2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> \
v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c804f590 0x7f07c80505c0 \
:-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got \
BADAUTHORIZER 2020-09-23 00:09:48.798 7f07e5f99700  0 --1-
[2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >>
v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c8055680
0x7f07c8053740 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0
l=1).handle_connect_reply_2 connect got BADAUTHORIZ




Nico Schottelius <nico.schottelius@ungleich.ch> writes:

> So the same problem happens with pgs which are in "unknown" state,
> 
> [19:31:08] black2.place6:~# ceph pg 2.5b2 query | tee query_2.5b2
> 
> hangs until the pg actually because active again. I assume that this
> should not be the case, should it?
> 
> 
> Nico Schottelius <nico.schottelius@ungleich.ch> writes:
> 
> > Update to the update: currently debugging why pgs are stuck in the
> > peering state:
> > 
> > [18:57:49] black2.place6:~# ceph pg dump all | grep 2.7d1
> > dumped all
> > 2.7d1     16666                  0        0         0       0 69698617344         \
> > 0          0 3002     3002                                                        \
> > peering 2020-09-22 18:49:28.587859   80407'8126117   80915:35142541    [22,84]    \
> > 22    [22,84]             22   80407'8126117 2020-09-22 17:23:11.860334   \
> > 79594'8122364 2020-09-21 13:27:16.376009             0 
> > The problem is that
> > 
> > ceph pg 2.7d1 query
> > 
> > hangs and does not output information. Does anyone know what could be
> > the cause for this?


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic