[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] hung rbd requests for one pool
From:       lacroute () skyportsystems ! com (Phil Lacroute)
Date:       2017-04-26 17:31:15
Message-ID: 4B7E7D9F-E2B3-4691-A1BF-BD2B699FB06E () skyportsystems ! com
[Download RAW message or body]

A quick update just to close out this thread:

After investigating with netstat I found one ceph-osd process had three TCP \
connections in established state but with no connection state on the peer system (the \
client node that previously had been using the RBD image).  The qemu process on the \
client had terminated and all connection state had been cleaned up.  On the osd node, \
two of these TCP connections had data in their send queues, and the retransmit timer \
had reached zero, but for some reason the retransmissions were not happening \
(confirmed with tcpdump) and the connections were not timing out.  The osd node \
remained in this state for over 24 hours.  At this point I?m unable to explain why \
TCP did not time out the connection, given that the peer had closed the connection.  \
This is a debian jessie system with a stock 4.9.0 kernel, so nothing non-standard \
about the networking stack.

The ceph-osd process seems to have gotten stuck because of this.  There were no \
active operations (according to "ceph daemon OSD ops?) but perhaps that is expected \
once data is already in the send queue.  In addition to the three connections in \
established state, there were over 100 connections in CLOSE_WAIT state which \
indicates that it was holding these descriptors open even though the TCP connections \
had terminated, so the reaping thread was perhaps blocked waiting for the pending I/O \
to finish.  Also, the osd would not accept any new requests associated with the same \
RBD image.  I?m not sure if there is any problem in the ceph code given the \
misbehaving TCP connection.  Better error handling to prevent getting stuck might be \
appropriate, but I?m not sure until I understand what caused the TCP problem.

Finally, the only thing slightly non-standard about our test environment is that we \
have IPSEC enabled, but that should be independent of the TCP layer.  There are no \
firewalls and ping was working fine.  The periodic IKE traffic for IPSEC \
renegotiation was also working (observed with tcpdump).

I will be rerunning the same tests, and if I can reproduce this and make more \
progress on the cause I?ll report back.

Thanks,
Phil


> On Apr 24, 2017, at 5:16 PM, Jason Dillaman <jdillama at redhat.com> wrote:
> 
> I would double-check your file descriptor limits on both sides -- OSDs
> and the client. 131 sockets shouldn't make a difference. Port is open
> on any possible firewalls you have running?
> 
> On Mon, Apr 24, 2017 at 8:14 PM, Phil Lacroute
> <lacroute at skyportsystems.com> wrote:
> > Yes it is the correct IP and port:
> > 
> > ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804
> > tcp        0      0 192.168.206.13:6804     0.0.0.0:*               LISTEN
> > 22934/ceph-osd
> > 
> > I turned up the logging on the osd and I don?t think it received the
> > request.  However I also noticed a large number of TCP connections to that
> > specific osd from the client (192.168.206.17) in CLOSE_WAIT state (131 to be
> > exact).  I think there may be a bug causing the osd not to close file
> > descriptors.  Prior to the hang I had been running tests continuously for
> > several days so the osd process may have been accumulating open sockets.
> > 
> > I?m still gathering information, but based on that is there anything
> > specific that would be helpful to find the problem?
> > 
> > Thanks,
> > Phil
> > 
> > On Apr 24, 2017, at 5:01 PM, Jason Dillaman <jdillama at redhat.com> wrote:
> > 
> > Just to cover all the bases, is 192.168.206.13:6804 really associated
> > with a running daemon for OSD 11?
> > 
> > On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute
> > <lacroute at skyportsystems.com> wrote:
> > 
> > Jason,
> > 
> > Thanks for the suggestion.  That seems to show it is not the OSD that got
> > stuck:
> > 
> > ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
> > ?
> > 2017-04-24 13:13:49.761076 7f739aefc700  1 -- 192.168.206.17:0/1250293899
> > --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38
> > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
> > 0x7f737c0064e0
> > ?
> > 2017-04-24 13:14:04.756328 7f73a2880700  1 -- 192.168.206.17:0/1250293899
> > --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con
> > 0x7f737c0064e0
> > 
> > ceph0:~$ sudo ceph pg map 1.af6f1e38
> > osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
> > 
> > ceph3:~$ sudo ceph daemon osd.11 ops
> > {
> > "ops": [],
> > "num_ops": 0
> > }
> > 
> > I repeated this a few times and it?s always the same command and same
> > placement group that hangs, but OSD11 has no ops (and neither do OSD16 and
> > OSD2, although I think that?s expected).
> > 
> > Is there other tracing I should do on the OSD or something more to look at
> > on the client?
> > 
> > Thanks,
> > Phil
> > 
> > On Apr 24, 2017, at 12:39 PM, Jason Dillaman <jdillama at redhat.com> wrote:
> > 
> > On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
> > <lacroute at skyportsystems.com> wrote:
> > 
> > 2017-04-24 11:30:57.058233 7f5512ffd700  1 -- 192.168.206.17:0/3282647735
> > --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
> > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
> > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
> > 0x7f54f40064e0
> > 
> > 
> > 
> > You can attempt to run "ceph daemon osd.XYZ ops" against the
> > potentially stuck OSD to figure out what it's stuck doing.
> > 
> > --
> > Jason
> > 
> > 
> > 
> > 
> > 
> > --
> > Jason
> > 
> > 
> 
> 
> 
> -- 
> Jason

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3589 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170426/3c8b857e/attachment.bin>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic