[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-nfs
Subject:    Re: [bug report] task hang while testing xfstests generic/323
From:       Trond Myklebust <trondmy () hammerspace ! com>
Date:       2019-02-28 23:56:52
Message-ID: dae18b965a55ed36071b5296d6b1466a57878d16.camel () hammerspace ! com
[Download RAW message or body]

On Thu, 2019-02-28 at 17:26 -0500, Olga Kornievskaia wrote:
> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
> jiufei.xue@linux.alibaba.com> wrote:
> > Hi,
> > 
> > when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
> > changed to zombie occasionally while a thread is hanging with the
> > following stack:
> > 
> > [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> > [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
> > [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
> > [<0>] nfs_file_release+0x35/0x50 [nfs]
> > [<0>] __fput+0xa2/0x1c0
> > [<0>] task_work_run+0x82/0xa0
> > [<0>] do_exit+0x2ac/0xc20
> > [<0>] do_group_exit+0x39/0xa0
> > [<0>] get_signal+0x1ce/0x5d0
> > [<0>] do_signal+0x36/0x620
> > [<0>] exit_to_usermode_loop+0x5e/0xc2
> > [<0>] do_syscall_64+0x16c/0x190
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [<0>] 0xffffffffffffffff
> > 
> > Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
> > NFS4ERR_OLD_STATEID), the client will retry to close the file when
> > stateid generation number in client is lower than server.
> > 
> > The original intention of this commit is retrying the operation
> > while
> > racing with an OPEN. However, in this case the stateid generation
> > remains
> > mismatch forever.
> > 
> > Any suggestions?
> 
> Can you include a network trace of the failure? Is it possible that
> the server has crashed on reply to the close and that's why the task
> is hung? What server are you testing against?
> 
> I have seen trace where close would get ERR_OLD_STATEID and would
> still retry with the same open state until it got a reply to the OPEN
> which changed the state and when the client received reply to that,
> it'll retry the CLOSE with the updated stateid.

I agree with Olga's assessment. The server is not allowed to randomly
change the values of the seqid, and the client should be taking pains
to replay any OPEN calls for which a reply is missed. The expectation
is therefore that NFS4ERR_OLD_STATEID should always be a temporary
state.

If it is not, then the bugreport needs to explain why the server bumped
the seqid without informing the client.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic