[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-nfs
Subject:    Re: hang on existing systems when exporting NFS share to new systems
From:       Jason Keltz <jas () cse ! yorku ! ca>
Date:       2010-07-31 2:07:27
Message-ID: 4C53855F.5060201 () cse ! yorku ! ca
[Download RAW message or body]

On 28/07/2010 1:42 PM, J. Bruce Fields wrote:
> On Wed, Jul 28, 2010 at 09:44:48AM -0400, Jason Keltz wrote
> My list of NFS exports has been gradually growing over the years.
> Right now, for example, my home directories are exported to around
> 800 hosts. (although only a relatively small subset of those will
> mount at the same time...).  I used to just add hosts to
> /etc/exports on the file server, and run "exportfs -r", and
> everything would be fine.  New systems would be able to mount
> everything perfectly, and existing systems would not be affected at
> all.  As the list has grown, I've been noticing a problem. Now, when
> I run exportfs -r, there is an approximate 7-10 second hang on the
> systems that have already mounted the share, and then everything
> returns to normal.  This doesn't happen *while* exportfs -r is
> running, but just after it exits.  I figured that maybe exportfs was
> "unexporting"/re-exporting to hosts that already had the share in
> use which might have caused the problem, so I tried to manually
> add/remove hosts thinking that this would only affect those hosts,
> but it did not. Exporting to one new host still causes the hang on
> all existing hosts.
>
> Since I have multiple exports to all of the hosts, adding one new
> host can hang things for a while.  I can see that reducing the list
> of exports, or hosts would reduce the delay.  What I am wondering is
> if there is a better way that I can add hosts without affecting
> connectivity to existing hosts?
>
> The NFS server itself is pretty powerful -- dual quad core box, lots
> of memory, many NFS threads, exclusive NFS server, etc...  I am
> running an older RHEL4 release though, so it would have an older
> kernel/NFS system.  Maybe this issue has been solved in newer
> releases.
>    
> There have been fixes in this area, though I don't see any that I'm sure
> would address your problem.  If you could test with the latest nfs-utils
> (ideally, with the latest nfs-utils and kernel) and let us know the
> result, that would be helpful.
>
> The -t option to rpc.mountd (may need a newer nfs-utils?) may also help.
>
> Also worth filing an RHEL bug.
>    

Hi Bruce,

I backported the -t option to RHEL4 by looking at the latest nfs-utils, 
but it didn't fix the problem.
I'm having trouble compiling the latest nfs-utils for RHEL4 because a 
couple of changed libraries...

What I have learned:

1) whether exportfs -r, or manually add a single host with exportfs, or 
even remove a host with exportfs -u, the delay to all the clients is the 
same.  The delay doesn't change depending on the share.
2) the delay doesn't happen while exportfs is running.  It happens 
immediately afterwards, and when it does happen, an strace of rpc.mountd 
shows that rpc.mountd is busy resolving every single hostname in etab.. 
on one of our NFS servers, this means a total of  13,000 DNS requests... 
on another system, that's over 30,000 DNS requests (and around a 30 
second delay to all shares).  Once rpc.mountd stops burdening the DNS, 
that's exactly when activity on all the shares returns.
3) I've tried to change /etc/exports to use just IP... but exportfs 
happily switches etab back to using hostnames, and then mountd does all 
the lookups again...

I suppose that the reason why exportfs doesn't convert etab to just use 
IPs in the first place is because a name can resolve to multiple IPs... 
but if I start with a list of IPs in /etc/exports, it would be nice if 
they just stayed like that in etab, and if mountd could use them as 
is... what's the point of all the DNS requests? (first to generate etab, 
then from mountd a second time!)

The only thing I can think to try at this point would be to see if I 
populated /etc/hosts locally on the file server to see if the timing 
works better than the DNS requests.

If someone has any suggestions, I'd love to hear them.

Thanks!

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic