'[Linux-HA] Re: crm_mon equivalent for serial and udp communications'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    [Linux-HA] Re: crm_mon equivalent for serial and udp communications
From:       David Livingstone <davidl () nlsedm1 ! cn ! ca>
Date:       2008-05-30 17:41:53
Message-ID: 48403C61.8020307 () nlsedm1 ! cn ! ca
[Download RAW message or body]

I performed the same tests as Matt on a similar 2-node
cluster with the same results - pacemaker/heartbeat setup.
My link are /dev/ttyS0 and eth2/eth0 setup for ucast as 
follows :

# hatest2 crossover
ucast eth2 192.168.36.129
# hatest1 crossover
ucast eth2 192.168.36.130
# hatest2 backup interface
ucast eth0 192.168.81.33
# hatest1 backup interface

[root@hatest1 sysconfig]# cl_status listnodes
hatest2
hatest1
[root@hatest1 sysconfig]# cl_status listhblinks hatest1
        /dev/ttyS0
        eth0
        eth0
        eth2
        eth2

--> Why are eth0/2 listed twice. For ucast shouldn't
    the interface on the same machine be ignored ?

[root@hatest1]# cl_status hblinkstatus hatest1 /dev/ttyS0
 dead
[root@hatest1]# cl_status hblinkstatus hatest2 /dev/ttyS0
 up
[root@hatest1]# cl_status hblinkstatus hatest2 eth0 -m
The node hatest2's heartbeat link eth0 is up
[root@hatest1]# cl_status hblinkstatus hatest1 eth0 -m
The node hatest1's heartbeat link eth0 is dead

--> the -m option only works when at the end.
--> From my perspective the link status should be part
    of the global monitoring tool - crm_mon an the gui.

on Fri, 2008-05-30 at 07:01 -0500, Matt Zagrabelny wrote:
> / On Fri, 2008-05-30 at 11:47 +0200, Dejan Muhamedagic wrote:
/>/ > On Fri, May 30, 2008 at 09:34:34AM +0200, Robert Heinzmann (ml) wrote:
/>/ > > >We just developed the tool what you say.
/>/ > > >See attached. It's a screen shot of it.
/>/ > > >How about it?
/>/ > > 
/>/ > > Is there a command line tool on the current code base showing this
/>/ > > information (maybe not as integrated in crm_mon as your tool) ?
/>/ > 
/>/ > There's cl_status hblinkstatus.
/>/ 
/>/ Perfect. Thanks Dejan.
/
Well *almost* perfect.

It looks as though there is some discrepancies in the reporting (or in
my understanding).

I have a two node cluster (squash and turnip).

+------------+           +------------+
> / /dev/ttyS1 |  <====>   | /dev/ttyS1 |
/|/            |           |            |
/|/  squash    |           |  turnip    |
/|/            |           |            |
/|/     eth2   |  <====>   |   eth2     |
/+------------+           +------------+

> From squash I run:

squash% cl_status hblinkstatus squash /dev/ttyS1
dead

squash% cl_status hblinkstatus turnip /dev/ttyS1
up

squash% cl_status hblinkstatus squash eth2
up

squash% cl_status hblinkstatus turnip eth2
up

> From turnip I run:

turnip% cl_status hblinkstatus squash /dev/ttyS1
up

turnip% cl_status hblinkstatus turnip /dev/ttyS1
dead

turnip% cl_status hblinkstatus squash eth2
up

turnip% cl_status hblinkstatus turnip eth2
up

So, a node will report that its own serial line is 'dead' but report
that its partner's is 'up'. Is this by design?

I would think that if heartbeat can see the correct data on the serial
line that *both* serial links would report 'up'.

> From the logs I get:

(on squash)
heartbeat[2677]: 2008/05/30_07:43:52 info: Link turnip:/dev/ttyS1 up.

(on turnip)
heartbeat[6429]: 2008/05/30_07:43:52 info: Link squash:/dev/ttyS1 up.

Thanks for any information,

-- 
Matt Zagrabelny - mzagrabe at d.umn.edu \
<http://lists.linux-ha.org/mailman/listinfo/linux-ha> - (218) 726 8844 University of \
Minnesota Duluth Information Technology Systems & Services
PGP key 1024D/84E22DA2 2005-11-07
Fingerprint: 78F9 18B3 EF58 56F5 FC85  C5CA 53E7 887F 84E2 2DA2

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[prev in list] [next in list] [prev in thread] [next in thread]