'multiple amd's and crashes'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       amd-dev
Subject:    multiple amd's and crashes
From:       Dave Mitchell <D.Mitchell () dcs ! shef ! ac ! uk>
Date:       1996-12-17 15:47:07
[Download RAW message or body]

>I've been thinking about this for a while.  One of the nastiest problems w/
>amd is debugging it.  If it crashes, your machine pretty much locks out
>(unless you mount the amd points themselves as "soft").  To support this,
>I'd need to change amd and amq.
>
>Barring any unexpected problems with running multiple amd's on the same host
>on different ports, I don't think it should be difficult to do.  I have to
>think how it interacts with the only one acceptable nfs port.

I've had experience running multiple amd's before now, during debugging
various problems. As far as I can tell:

1) The NFS port number was not a problem - each daemon gets a different
anonymous port < 1024, and these are recorded ok in mtab.
2) I had to specify a different mount dirctory, eg /a vs /a.test
3) Obviously I had to specify different mount points for all the indirect
   maps, and I didn't try using a direct map.
3) The interface between amd and amq *did* break - basically the newer
   amd registered itself with the portmapper and thus overrode the old
   amd's registration - so amq only spoke to the newer amd.

perhaps we require another debug flag to amd and amq, which would
allow the apps use a different program number for the amd/amq RPC
link.

In terms of the system hanging when amd crashes, a possible solution to
this would be to have a stub NFS server program called eg am-null,
which would simply respond to all NFS requests on the specified port
with an appropriate negative response such as ENOENT. So when amd
crashes and the dreaded
	NFS server (pid 1023) not responding
messages appear, simply start up the program on the appropriate port
	# am-null 1023 &
Then all processes which had hung on the automounter will return from
(probably unkillable) kernel waits, and either exit of their own
accord, or can be safely killed.

NB I would see this as tool for amd developers to recover their machine
rather than for normal sysadmin use. It does rather assume that amd has
exited (and freed up the port) rather that being hung in an unkillable
state.


Dave.

* David Mitchell, Systems Administrator,    email: D.Mitchell@dcs.shef.ac.uk
* Dept. Computer Science, Sheffield Uni.    phone: +44 114-222-1851
* 211 Portobello St, Sheffield S1 4DP, UK.  fax:   +44 114-278-0972
*
* Standards (n). Battle insignia or tribal totems
*
* >>>> Support Randal Schwartz! email fund@stonehenge.com for info <<<<<

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic