[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] Re: [Linux-HA] cluster status
From:       Joachim Banzhaf <jbanzhaf () ngi ! de>
Date:       2004-04-29 14:37:27
Message-ID: 200404291637.27738.jbanzhaf () ngi ! de
[Download RAW message or body]

Hi Alan,

Am Mittwoch, 28. April 2004 19:00 schrieb Alan Robertson:
> Joachim Banzhaf wrote:
...
> > I don't know if my info is up to date,since it's been a while I looked
> > into cluster status and there is a lot of work going on in that area. As
> > you might (not?) remember, I sent a patch to have a clusterstatus utility
> > based on apitest (returncode and/or message). At this time there was an
> > additional cluster status: transition.
>
> You're right.  I forgot about "transition".
>
> This just means that it's not in any of these stable states - it's
> currently changing.

This reminds me, you wanted to include cluster_status in the project back on 
8.8.2003. You required a legal statement e-mail that I sent the same day.
I just checked and did not see cluster_status in 1.2.1 tgz. Did you change 
your mind or did it just get lost?
In case it got lost, here it is again, with slightly updated comments and 
usage message.

In addition to the mentioned states none, local, foreign, all and transition 
it reports 'unknown' if it cannot determine the state (hb not running, no 
auth, ...)

Since communication authentication changed, it requires an 'apiauth default' 
directive in ha.cf. It took me a while to figure that out. How about a hint 
in the api reference? Since there is no obvious link between api 
authentication and ha.cf, I did not look in doc/ha.cf until I stumbled over 
the apiauth parameter in the archives. 

Joachim Banzhaf

["cluster_status.c" (text/x-csrc)]

/*
 * cluster_status: program for checking the heartbeat cluster status
 * derived from api_test.c of Alan Robertson by Joachim Banzhaf <jbanzhaf@ngi.de>
 * Copyright (C) 2003/2004 Joachim Banzhaf <jbanzhaf@ngi.de>
 * 
 * DESCRIPTION
 * returns current resource status on stdout
 * (last word is one of all, local, foreign, none, transient or unknown)
 * returns error messages on stderr
 * returncodes: see definitions below
 * needs "apiauth default" parameter in ha.cf
 *
 * QUESTIONS
 *   which include files are really necessary? (just copied them all from api_test.c)
 *   unlike api_test I always call delete if ll_cluster_new succeeds: ok?
 *   unlike api_test I always call signoff if signon succeeds: ok?
 *   why would this program sometimes hang when heartbeat is not running?
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * This software is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 *
 * You should have received a copy of the GNU General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */


#include <portability.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/utsname.h>
#include <sys/time.h>
#include <sys/stat.h>
#include <stdarg.h>
#include <syslog.h>
#include <hb_api_core.h>
#include <hb_api.h>
#include <signal.h>


#define OK_NONE        0
#define OK_LOCAL       1
#define OK_FOREIGN     2
#define OK_ALL         3
#define OK_TRANSITION  4
#define OK_UNKNOWN     5
#define WARN_DELETE    8
#define WARN_SIGNOFF  16
#define ERR_STATE     32
#define ERR_SIGNON    33
#define ERR_REGISTER  34
#define ERR_SYNTAX    128


char       message[40];
const char format[] = "Current resource status: %s\n";
int        rc       = 0;
int        alarm_rc = 0;


void timeout(int);


void
timeout(int sig)
{
	fprintf(stderr, "%s\n", message);
	fprintf(stderr, "REASON: %s\n", "API call timed out" );
	printf(format, "unknown");

	exit(rc + alarm_rc);
}


int
main(int argc, char ** argv)
{
	ll_cluster_t*	 hb;
	const char *	 cval;
	struct sigaction action;

	if (argc > 1) {
		fprintf(stderr, "%s 1.0.1 by Joachim Banzhaf (jbanzhaf@ngi.de)\n\
\n\
usage: %s\n\
\n\
Returns heartbeat resource status on stdout, where status is one of\n\
  none, local, foreign, all, transition or unknown.\n\
Returns error messages on stderr.\n\
Returncodes are\n\
  %3d: no resources are running on this node\n\
  %3d: only local resources are running on this node\n\
  %3d: only foreign resources are running on this node\n\
  %3d: all resources are running on this node\n\
  %3d: resource state is in transition\n\
  %3d: resource state returned is unknown\n\
  %3d: could not delete heartbeat handle\n\
  %3d: could not sign off heartbeat library\n\
  %3d: could not get resource state\n\
  %3d: could not sign on to heartbeat library\n\
  %3d: could not get heartbeat handle\n\
  %3d: syntax error\n",
		argv[0], argv[0], OK_NONE, OK_LOCAL, OK_FOREIGN, OK_ALL,
                 OK_TRANSITION, OK_UNKNOWN, WARN_DELETE, WARN_SIGNOFF,
                 ERR_STATE, ERR_SIGNON, ERR_REGISTER, ERR_SYNTAX);
		exit(ERR_SYNTAX);
	}

	/* I sometimes experienced hangs when heartbeat is not running so, ... */
	action.sa_handler = timeout;
	action.sa_flags   = SA_NOMASK;
	sigaction(SIGALRM, &action, NULL);
	strcpy(message, "Cannot register with heartbeat library");
	alarm_rc=ERR_REGISTER;
	alarm(5);
	hb = ll_cluster_new("heartbeat");
	if (hb == NULL) {
		fprintf(stderr, "%s\n", message);
		fprintf(stderr, "REASON: %s\n", hb->llc_ops->errmsg(hb));
		printf(format, "unknown");
		exit(ERR_REGISTER);
	}

	strcpy(message, "Cannot sign on with heartbeat");
	alarm_rc=ERR_SIGNON;
        if (hb->llc_ops->signon(hb, NULL) != HA_OK) {
                fprintf(stderr, "%s\n", message);
                fprintf(stderr, "REASON: %s\n", hb->llc_ops->errmsg(hb));
                printf(format, "unknown");
        	rc = ERR_SIGNON;
	}else{
		alarm_rc = ERR_STATE;
		strcpy(message, "Cannot get resource status");
		if ((cval = hb->llc_ops->get_resources(hb)) == NULL) {
			fprintf(stderr, "%s\n", message);
			fprintf(stderr, "REASON: %s\n", hb->llc_ops->errmsg(hb));
			printf(format, "unknown");
			rc = ERR_STATE;
		}else{
			if( strcmp(cval, "none") == 0 ) {
				rc = OK_NONE;
			} else if( strcmp(cval, "local") == 0 ) {
				rc = OK_LOCAL;
			} else if( strcmp(cval, "foreign") == 0 ) {
				rc = OK_FOREIGN;
			} else if( strcmp(cval, "all") == 0 ) {
				rc = OK_ALL;
			} else if( strcmp(cval, "transition") == 0 ) {
				rc = OK_TRANSITION;
			} else {
				rc = OK_UNKNOWN;
			}
			printf(format, cval);
		}

		alarm_rc = WARN_DELETE;
		strcpy(message, "Cannot sign off from heartbeat");
		if (hb->llc_ops->signoff(hb) != HA_OK) {
			fprintf(stderr, "%s\n", message);
			fprintf(stderr, "REASON: %s\n", hb->llc_ops->errmsg(hb));
			rc += WARN_SIGNOFF;
		}
	}

	alarm_rc=WARN_DELETE;
	strcpy(message, "Cannot delete API object");
	if (hb->llc_ops->delete(hb) != HA_OK) {
		fprintf(stderr, "%s\n", message);
		fprintf(stderr, "REASON: %s\n", hb->llc_ops->errmsg(hb));
		rc += WARN_DELETE;
	}

	return rc;
}


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic