[prev in list] [next in list] [prev in thread] [next in thread]
List: log
Subject: Re: What makes svstat think a service is down?
From: prj () po ! cwru ! edu (Paul Jarc)
Date: 2002-05-31 23:36:06
[Download RAW message or body]
"Hubbard, David" <dhubbard@dino.hostasaurus.com> wrote:
> I linked from /service and everything started back up but
> readproctitle complained about the lock files.
There's normally no way of knowing how old readproctitle's error
messages are. You can have a service like this to clear the messages:
#!/bin/sh
echo ...............................................
date
Give it a "down" file and use svc -o to run it once.
Alternatively, you can use svclean so that service errors will go to a
supervised multilog: <URL:http://multivac.cwru.edu./svclean/>.
> I then went to /var/qmail/supervise and just did a
>
> rm -r */supervise */log/supervise
Bad idea. That made it impossible for the new supervises to tell
whether there was an old supervise already running. You only silenced
the error message; you didn't fix the problem reported in the error
message (if there was one).
> The problem is that somewhere between 354 seconds and maybe
> 480 seconds (8 minutes) at most, the next time I ran
> svstat /service/qmail-send, svscan goes from reporting the
> processes as being up to being down, even though that is not
> the case.
That means that the currently running supervise did not spawn the
currently running qmail-send. Maybe supervise was killed. In that
case, svscan would restart supervise, and the new supervise would try
to start qmail-send, but qmail-send wouldn't be able to lock
/var/qmail/queue/sendmutex, so it would exit immediately.
> Additionally, it reports the downtime as much longer than possible,
> for example:
>
> root@/# svstat /service/qmail-send
> /service/qmail-send: up (pid 23483) 354 seconds
> root@/# svstat /service/qmail-send
> /service/qmail-send: down 34047 seconds, normally up
It appears that there are two supervises running. You need to improve
your process hunting skills or reboot. I suspect this was caused by
removing the supervise directories.
paul
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic