'Re: [Gluster-devel] fail-over taking too long when a node reboots'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gluster-devel
Subject:    Re: [Gluster-devel] fail-over taking too long when a node reboots
From:       Niels de Vos <ndevos () redhat ! com>
Date:       2016-07-27 11:49:15
Message-ID: 20160727114915.GD16998 () ndevos-x240 ! usersys ! redhat ! com
[Download RAW message or body]

[Attachment #2 (multipart/signed)]

On Wed, Jul 27, 2016 at 12:40:58PM +0530, Pranith Kumar Karampuri wrote:
> hi,
>      Does anyone have complete understanding of keepalive timeout vs TCP
> User timeout (UTO) options? For both afr and EC when the server reboots it
> takes 42 seconds for the fops to fail with ENOTCONN
> (saved_frames_unwind()). I am wondering if there is any way to reduce this
> time by playing with these two options. As per our earlier research on this
> (I think it was kp who did that) keepalive was not getting triggered when
> there are fops in progress and he saw quite a few game-dev forums talk
> about this problem too. It seems like there is a new timeout called TCP
> User timeout which seems to address this. I am wondering if anyone of you
> have any experience with this and suggest defaults to be changed for these
> timeouts which are more meaningful. I think at the moment default is 42
> seconds.

http://review.gluster.org/8065 might be related? More details are in
https://www.gluster.org/pipermail/gluster-devel/2014-May/040755.html and
https://bugzilla.redhat.com/show_bug.cgi?id=1129787

HTH,
Niels

["signature.asc" (application/pgp-signature)]

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[prev in list] [next in list] [prev in thread] [next in thread]