[prev in list] [next in list] [prev in thread] [next in thread]
List: ceph-users
Subject: [ceph-users] hung rbd requests for one pool
From: lacroute () skyportsystems ! com (Phil Lacroute)
Date: 2017-04-26 17:31:15
Message-ID: 4B7E7D9F-E2B3-4691-A1BF-BD2B699FB06E () skyportsystems ! com
[Download RAW message or body]
A quick update just to close out this thread:
After investigating with netstat I found one ceph-osd process had three TCP \
connections in established state but with no connection state on the peer system (the \
client node that previously had been using the RBD image). The qemu process on the \
client had terminated and all connection state had been cleaned up. On the osd node, \
two of these TCP connections had data in their send queues, and the retransmit timer \
had reached zero, but for some reason the retransmissions were not happening \
(confirmed with tcpdump) and the connections were not timing out. The osd node \
remained in this state for over 24 hours. At this point I?m unable to explain why \
TCP did not time out the connection, given that the peer had closed the connection. \
This is a debian jessie system with a stock 4.9.0 kernel, so nothing non-standard \
about the networking stack.
The ceph-osd process seems to have gotten stuck because of this. There were no \
active operations (according to "ceph daemon OSD ops?) but perhaps that is expected \
once data is already in the send queue. In addition to the three connections in \
established state, there were over 100 connections in CLOSE_WAIT state which \
indicates that it was holding these descriptors open even though the TCP connections \
had terminated, so the reaping thread was perhaps blocked waiting for the pending I/O \
to finish. Also, the osd would not accept any new requests associated with the same \
RBD image. I?m not sure if there is any problem in the ceph code given the \
misbehaving TCP connection. Better error handling to prevent getting stuck might be \
appropriate, but I?m not sure until I understand what caused the TCP problem.
Finally, the only thing slightly non-standard about our test environment is that we \
have IPSEC enabled, but that should be independent of the TCP layer. There are no \
firewalls and ping was working fine. The periodic IKE traffic for IPSEC \
renegotiation was also working (observed with tcpdump).
I will be rerunning the same tests, and if I can reproduce this and make more \
progress on the cause I?ll report back.
Thanks,
Phil
> On Apr 24, 2017, at 5:16 PM, Jason Dillaman <jdillama at redhat.com> wrote:
>
> I would double-check your file descriptor limits on both sides -- OSDs
> and the client. 131 sockets shouldn't make a difference. Port is open
> on any possible firewalls you have running?
>
> On Mon, Apr 24, 2017 at 8:14 PM, Phil Lacroute
> <lacroute at skyportsystems.com> wrote:
> > Yes it is the correct IP and port:
> >
> > ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804
> > tcp 0 0 192.168.206.13:6804 0.0.0.0:* LISTEN
> > 22934/ceph-osd
> >
> > I turned up the logging on the osd and I don?t think it received the
> > request. However I also noticed a large number of TCP connections to that
> > specific osd from the client (192.168.206.17) in CLOSE_WAIT state (131 to be
> > exact). I think there may be a bug causing the osd not to close file
> > descriptors. Prior to the hang I had been running tests continuously for
> > several days so the osd process may have been accumulating open sockets.
> >
> > I?m still gathering information, but based on that is there anything
> > specific that would be helpful to find the problem?
> >
> > Thanks,
> > Phil
> >
> > On Apr 24, 2017, at 5:01 PM, Jason Dillaman <jdillama at redhat.com> wrote:
> >
> > Just to cover all the bases, is 192.168.206.13:6804 really associated
> > with a running daemon for OSD 11?
> >
> > On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute
> > <lacroute at skyportsystems.com> wrote:
> >
> > Jason,
> >
> > Thanks for the suggestion. That seems to show it is not the OSD that got
> > stuck:
> >
> > ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
> > ?
> > 2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899
> > --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38
> > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
> > 0x7f737c0064e0
> > ?
> > 2017-04-24 13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899
> > --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con
> > 0x7f737c0064e0
> >
> > ceph0:~$ sudo ceph pg map 1.af6f1e38
> > osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
> >
> > ceph3:~$ sudo ceph daemon osd.11 ops
> > {
> > "ops": [],
> > "num_ops": 0
> > }
> >
> > I repeated this a few times and it?s always the same command and same
> > placement group that hangs, but OSD11 has no ops (and neither do OSD16 and
> > OSD2, although I think that?s expected).
> >
> > Is there other tracing I should do on the OSD or something more to look at
> > on the client?
> >
> > Thanks,
> > Phil
> >
> > On Apr 24, 2017, at 12:39 PM, Jason Dillaman <jdillama at redhat.com> wrote:
> >
> > On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
> > <lacroute at skyportsystems.com> wrote:
> >
> > 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735
> > --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
> > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
> > 0x7f54f40064e0
> >
> >
> >
> > You can attempt to run "ceph daemon osd.XYZ ops" against the
> > potentially stuck OSD to figure out what it's stuck doing.
> >
> > --
> > Jason
> >
> >
> >
> >
> >
> > --
> > Jason
> >
> >
>
>
>
> --
> Jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3589 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170426/3c8b857e/attachment.bin>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic