[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    [Linux-ha-dev] cluster communication on OpenBSD
From:       "Sebastian Reitenbach" <sebastia () l00-bugdead-prods ! de>
Date:       2007-08-26 17:30:20
Message-ID: 20070826173020.E5D9239436 () l00-bugdead-prods ! de
[Download RAW message or body]

Hi,

with the heartbeat 2.1.2 port to OpenBSD I was able to setup a cluster 
between two i386 OpenBSD machines, using unicast communication. It also 
worked well between a i386 and a sparc machine. But when I tried to add a 
second ucast statement to the ha.cf file, then the cluster refuses to start 
up:

Aug 26 19:16:50 heartbeat heartbeat: [20236]: ERROR: glib: ucast: error 
binding socket. Retrying: Address already in use

when I switch to multicast communication, then heartbeat also refuses to 
work too:
Aug 26 15:19:21 defiant heartbeat: [23219]: ERROR: write failure on mcast 
fxp0.: Host is down
Aug 26 15:19:23 defiant heartbeat: [23219]: ERROR: glib: Unable to send 
mcast packet [-1]: Host is down

when I switch to broadcast, then I can get the three nodes to work together, 
for a short time. but when I maybe put a node into standby and then active 
again, and relocate some resouces, I start seeing messgages of that sort:

Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: write failure on bcast 
fxp0.: Message too long
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: glib: Unable to send 
bcast [-1] packet(len=1695): Message too long
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG: Dumping message with 
24 fields
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG[0] : [t=cib]
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG[1] : 
[cib_clientid=8c8cc7ff-b425-447a-af46-cf8ade3f4566]
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG[2] : 
[cib_callopt=1048576]
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG[3] : [cib_callid=16]
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG[4] : 
[cib_op=cib_apply_diff]
Aug 26 15:32:46 defiant heartbeat: [10054]: ERROR: MSG[5] : 
[cib_section=status]
...

message to long for broadcast? but unicast works? below my configuration, I 
already use bz2 compression. 
anyone has an idea why I have these cluster communication problems?


autojoin any
crm yes
compression bz2
use_logd on
deadtime 15
initdead 40
keepalive 2
node defiant.ds9 heartbeat.ds9 warbird.ds9
#node defiant.ds9 heartbeat.ds9
#mcast rl0 224.0.0.1 702 1 0
bcast rl0
#ucast rl0 warbird.ds9
#ucast rl0 defiant.ds9
ping 10.0.0.1 10.11.0.1
debug true

kind regards
Sebastian

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic