[prev in list] [next in list] [prev in thread] [next in thread]
List: amd-dev
Subject: Badness when setting ping interval
From: Nick Williams <Nick.Williams () morganstanley ! com>
Date: 2003-11-03 14:23:24
[Download RAW message or body]
A while ago we discussed setting ping to -1 for TCP mounts. I've
recently tried setting this within some of our mounts for specific
fileservers (specifically, for fileservers which have their own failover
capability which could take longer than a minute...). It looked like it
worked, but then we later found that 'some' machines could no longer
access the filesystems, getting an Input/Output error! This happened on
both sol8 (amd v6.0.3) and on linux (2.4.9-e12enterprise from redhat,
with amd v6.0.8). However, other machines (same version) had no problems
to the same fileservers.
Here's a more detailed timeline of things happening on a linux machine.
The key in question is 'govts':
govts -sublink:=govts;rhost:=ln0fnf02;rfs:=/d/ln0fnf02/d25
host!=ln0fnf02;type:=nfs;opts:=ping=-1 || type:=link;fs:=${rfs}
Nov 2 15:30:33 haifd51 amd[1739]: get_nfs_version: returning (3,tcp) on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: get_nfs_version: returning (3,udp) on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: Using NFS version 3, protocol tcp on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: file server ln0fnf02, type nfs, state
wired up
Nov 2 15:30:33 haifd51 amd[1739]: Trying mount of
ln0fnf02:/d/ln0fnf02/d25 on /tmp_amd/govts fstype nfs
Nov 2 15:30:33 haifd51 amd[1739]: recompute_portmap: NFS version 3
Nov 2 15:30:33 haifd51 amd[1739]: Using MOUNT version: 3
Nov 2 15:30:33 haifd51 amd[1739]: get_nfs_version: returning (3,tcp) on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: get_nfs_version: returning (3,udp) on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: Using NFS version 3, protocol tcp on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: Trying mount of
ln0fnf02:/d/ln0fnf02/d25 on /tmp_amd/govts fstype nfs
Nov 2 15:30:33 haifd51 amd[1739]: call_mountd: NFS version 3, mount
version 3
Nov 2 15:30:33 haifd51 amd[1739]: get_nfs_version: returning (3,tcp) on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: get_nfs_version: returning (3,udp) on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: Using NFS version 3, protocol tcp on
host ln0fnf02
Nov 2 15:30:33 haifd51 amd[1739]: Trying mount of
ln0fnf02:/d/ln0fnf02/d25 on /tmp_amd/govts fstype nfs
Nov 2 15:30:33 haifd51 amd[1739]: prime_nfs_fhandle_cache: NFS version 3
Nov 2 15:30:33 haifd51 amd[27648]: Using remopts="ping=-1"
Nov 2 15:30:33 haifd51 amd[27648]: mount_nfs_fh: NFS version 3
Nov 2 15:30:33 haifd51 amd[27648]: mount_nfs_fh: using NFS transport tcp
Nov 2 15:30:33 haifd51 amd[1739]: ln0fnf02:/d/ln0fnf02/d25 mounted
fstype nfs on /a/ln0fnf02/d/ln0fnf02/d25
And since the process using this directory ties it up forever, we see
continual messages relating to amd trying to unmount it:
Nov 3 00:55:16 haifd51 amd[1739]: "/tmp_amd/govts" on
/a/ln0fnf02/d/ln0fnf02/d25 still active
Nov 3 00:57:16 haifd51 amd[1739]: "/tmp_amd/govts" on
/a/ln0fnf02/d/ln0fnf02/d25 still active
But these messages stopped early this morning and amd has never looked
at it again. The last message was:
Nov 3 04:03:16 haifd51 amd[1739]: "/tmp_amd/govts" on
/a/ln0fnf02/d/ln0fnf02/d25 still active
At this point in time, any access to /tmp_amd/govts hangs for a few
minutes then give the 'Input/Output' error. Nothing we can do fixes the
problem at this point, so we look at reverting the ping value back up to
its default (or at least, not to -1).
I forcibly try to remove it (don't expect it to work, coz production
processes are holding it open, but at least I can see what amd thinks
about it)
Nov 3 11:24:43 haifd51 amd[1739]: "/tmp_amd/govts" forcibly timed out
Nov 3 11:24:43 haifd51 amd[1739]: "/tmp_amd/govts" on
/a/ln0fnf02/d/ln0fnf02/d25 still active
But I'm still not seeing any other attempted unmounts on a timer basis.
I changed the key so that it now says ping=60...
Nov 3 12:40:42 haifd51 amd[1739]: reload #5 of map /etc/amd.map succeeded
and it now users can 'cd' and 'ls' to their hearts content - things just
work, although I notice that I'm still not seeing any unmount attempts.
I'm trying to replicate this problem at the moment on non-production
machines, so maybe I'll get further, but I'm curious to see if anyone
else has seen anything like this behaviour before.
Nick
_______________________________________________
amd-dev mailing list: amd-dev@cs.columbia.edu
Am-utils: http://www.am-utils.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic