[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-nfs
Subject:    Re: Lockd error message is unclear.
From:       Rogier Wolff <R.E.Wolff () BitWizard ! nl>
Date:       2021-04-27 21:10:44
Message-ID: 20210427211044.vxvgieqe4ud5lh7o () BitWizard ! nl
[Download RAW message or body]

On Tue, Apr 27, 2021 at 03:34:52PM -0400, J. Bruce Fields wrote:
> On Tue, Apr 27, 2021 at 09:03:11PM +0200, Rogier Wolff wrote:
> > 
> > Hi, 
> > 
> > Two things..... 
> > 
> > I got: 
> > 
> >    lockd: cannot monitor <client> 
> > 
> > in the logfile and the client was terrily slow/not working at all.
> > 
> > everything pointed to a lockd problem... 
> > 
> > In the end... it turns out that my rpc.statd stopped working.  I had
> > to go and download the sources to figure this out... I would firstly
> > suggest to improve the error message to give others running into this
> > more hints as to where to look.
> > 
> > The erorr message on line 169 of lockd.c could read: 
> > 
> > 	lockd: Error in the rpc to rpc.statd to monitor %s\n
> > 
> > Would it be an idea to print the res.status error code? 
> 
> I'm not sure about the wording, but including the error code sounds like
> a good idea.  (Would that have made a difference in your case?)

Not sure. Of course I was just "looking for a solution". So once I
figured out that rpc.statd was missing I went looking for how that
came about. 

But as it was the prime culprit was "lockd is misbehaving". With a
better error message you can shift the blame away from your part of
the system. :-)

> > second?) timeout in nsm_mon_unmon and the big backlog of requests that
> > result in the same call and timeout that frustrate the client... )
> 
> The -ECONNREFUSED case?
> 
> I'm not sure why it retries there.  Maybe just to allow stopping and
> starting rpc.statd (e.g. for upgrades) without failing operations?

Not sure IF it was retrying. Maybe not. But starting "google-chrome"
with 40 open tabs didn't progress to any tabs loading inside the half
hour that I was looking for why this was happening (unable to google
for a solution).... So in the meantime it was constantly spewing the
error message, rate limited to 10 per minute....

	Roger.

-- 
** R.E.Wolff@BitWizard.nl ** https://www.BitWizard.nl/ ** +31-15-2049110 **
**    Delftechpark 11 2628 XJ  Delft, The Netherlands.  KVK: 27239233    **
f equals m times a. When your f is steady, and your m is going down
your a is going up.  -- Chris Hadfield about flying up the space shuttle.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic