[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-nfs
Subject:    Re: NFSv4 null request and compatibility with netapp
From:       Franck Eyraud <franck.eyraud () jrc ! ec ! europa ! eu>
Date:       2011-06-30 7:07:07
Message-ID: 4E0C209B.5060405 () jrc ! ec ! europa ! eu
[Download RAW message or body]

Hi Tom,

Thanks for the answer. It seems from our colleagues managing the filer 
that it is running Data OnTap version 7.3.3 P5.

A ticket has been opened to our technical support for NetApp (Telindus) 
with number ISS ISS00094182 but I think it is not a NetApp case number. 
I will ask them if they can provide me this case number.

Franck

On 29/06/2011 19:11, Thomas Haynes wrote:
> Franck,
> 
> What version of OnTap are you running?
> 
> Actually, if you have a case number from NetApp, I can look at that internally.
> 
> Thanks,
> Tom
> 
> 
> On Jun 29, 2011, at 11:50 AM, Trond Myklebust wrote:
> 
> > On Wed, 2011-06-29 at 16:37 +0200, Franck Eyraud wrote:
> > > Dear nfs linux list,
> > > 
> > > I have an issue with NFSv4 clients running debian with linux kernel
> > > version 2.6.29 and above. The NFS4 server is n a NetApp NAS.
> > > 
> > > The problem arised when, after upgrading the kernel of our machines, the
> > > filer started to issue a lot of these messages :
> > > 
> > > Client 1XX.1XX.2XX.73 has an authentication error 2
> > > Client 1XX.1XX.2XX.73 is sending bad rpc requests with error: RPC version \
> > > mismatch or authentication error(73) 
> > > The NetApp team analyzed the tcp trace and sent us with this answer, basically \
> > > saying that the fault is from client side : 
> > > Can someone on the list confirm that their affirmations are correct ?
> > > 
> > > I already opened a bug on \
> > > debianhttp://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632074  where they said \
> > > that maybe the client behavior is not that odd with respect with the RFC \
> > > standard.
> > Without a trace, it is hard to affirm anything, but please note the
> > following:
> > 
> > 1. The client will usually share the same TCP connection for _all_
> > mounts to any given server, so even if you didn't see the client
> > establish the rpcsec_gss session after the particular mount that
> > you used to test on, it may have been established by a previous
> > mount.
> > 2. Once the client is finished using an rpcsec_gss session, and
> > that session is kicked out of the cache (usually within 1 minute
> > of last use, but it may occasionally take longer), then the
> > client will send out a NULL call with the RPCSEC_GSS_DESTROY
> > message.
> > 3. The client doesn't know, before sending an RPC call, whether or
> > not the server has already expired the rpcsec_gss session. If
> > the server has expired it without the client's knowledge, then
> > it is unreasonable to declare the RPCSEC_GSS_DESTROY call to be
> > an error. The server should simply try to obey the call, and
> > then acknowledge it as a success, whether or not it actually
> > found an active session.
> > 
> > IOW: this all looks to me like a storm in a teacup brought about by a
> > server implementation that is logging errors in a case where it
> > shouldn't.
> > 
> > Cheers
> > Trond
> > 
> > > ---------
> > > Tue May 24 10:05:59 MEST [vcid@s-jrciprna004p: nfsd.rpc.request.bad:warning]: \
> > > Client 1XX.1XX.2XX.73 is sending bad rpc requests with error: RPC version \
> > > mismatch or authentication error(73)
> > > 
> > > We looked in the code and the (73) has no significance here and is simply the \
> > > error code number for "RPC version mismatch or authentication error". 
> > > What we see is that the following occurs at the time of these errors:
> > > - The client has an established TCP session on which it does NFSv4.
> > > - The NFSv4 calls uses Kerberos.
> > > - On that TCP session, the client occasionally does a NULL call.
> > > - The filer rejects it with an authentication error (auth state 2, client must \
> > >                 begin new session)
> > > - The client does a new NULL call on a separate TCP session without a GSS \
> > >                 context.
> > > - The filer responds and a new context is established.
> > > - The client continues on the original TCP session with the new context.
> > > 
> > > This explains why no side effect is seen: the client simply establishes a new \
> > > context and continues as if nothing had happened. We have checked through the \
> > > trace for vlan 240 and the pattern is the same throughout and the error always \
> > > happens for NULL calls only (occasionally two replies may be sent in the same \
> > > TCP payload, but the error is always on the NULL reply, then). We know that \
> > > some Debian kernels do not exhibit this problem at any time, but others do. \
> > > This (along with the problem being tied to NULL calls only) suggest to us that \
> > > this is due to client side behaviour. 
> > > Anyway, we tried to check for the first occurrence of the error, which warrants \
> > >                 some chronology. We'll do references per clock second for ease.
> > > - The first client call is at 10:04:16 in an established NFS mount.
> > > - The initial part of the trace, the client only uses TCP port 1006.
> > > - The client uses the same GSS context, with the exception of a SETCLIENTID and \
> > >                 a SETCLIENTID_CONFIRM call.
> > > - At 10:05:00 the client tears down four GSS sessions (used for Kerberos) using \
> > > RPCSEC_GSS_DESTROY in an NFSv4 NULL call. This is done from TCP port 1006 but \
> > > for four different contexts. None of these have been used in the trace at that \
> > >                 point.
> > > - The client continues with more cals on port 1006 using the the same GSS \
> > >                 context.
> > > - Still at 10:05:00 (frame 1982792), the client uses an NFSv4 call to do a \
> > >                 RPCSEC_GSS_INIT to establish a new GSS context.
> > > - The client continues using the new GSS context and does not reuse the old \
> > >                 context.
> > > - The sequence described above on the NULL calls start.
> > > 
> > > Looking closer at these steps, we notice something important in the NULL calls.
> > > Above, the client destroyed four GSS contexts that were not used during the \
> > > trace. However, it did not destroy the GSS context it was using for a while \
> > > there. 
> > > However, we now note the client actually does a RPCSEC_GSS_DESTROY in each of \
> > > the NFSv4 NULL calls where we respond with an authentication error. As the \
> > > error indicates that the client has to begin a new session, this seems like a \
> > > reasonable response to the call. 
> > > So to summarize:
> > > - The filer logs these errors when the client destroys a GSS context.
> > > - The error message is a logical response.
> > > 
> > > The decission to tear down the GSS context is with the client. So this would \
> > > seem to be a client side issue after all, which just happens to get logged on \
> > >                 the filer.
> > > -------------
> > > 
> > > Thank you for your help,
> > > 
> > > Franck Eyraud
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message tomajordomo@vger.kernel.org
> > > More majordomo info athttp://vger.kernel.org/majordomo-info.html
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer
> > 
> > NetApp
> > Trond.Myklebust@netapp.com
> > www.netapp.com
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message tomajordomo@vger.kernel.org
> > More majordomo info athttp://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html
> 

-- 
Franck Eyraud
European Commission - Joint Research Centre
Institute for Environment and Sustainability
MARS Unit - Community Image Data Portal

TP 266, Via E. Fermi 2749, I-21027 Ispra (VA), Italy
Tel: +39 0332 78 9486   Fax: +39 0332 78 5162

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic