[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    Re: [Linux-HA] Load balancing fails
From:       Dejan Muhamedagic <dejanmm () fastmail ! fm>
Date:       2008-07-28 18:05:34
Message-ID: 20080728180533.GD11830 () rondo ! homenet
[Download RAW message or body]

Hi,

On Thu, Jul 24, 2008 at 07:37:24PM +0200, Ralf Bardoel wrote:
> Dear Linux-HA users,
> 
> I want to load balance two apache web servers through two load balancer 
> nodes with heartbeat on them. I followed a few guides, but cannot get it 
> working without the cluster failing after 15 minutes. After some time the 
> cluster/virtual ip is down, but after one hour it is up again, and so 
> on..... Here is some information: My system runs on Centos 5 and runs as 
> guest under vmware server 2 (rc1), these versions are installed: 
> heartbeat-stonith-2.1.3-3.el5.centos, heartbeat-pils-2.1.3-3.el5.centos, 
> heartbeat-gui-2.1.3-3.el5.centos, heartbeat-ldirectord-2.1.3-3.el5.centos, 
> heartbeat-2.1.3-3.el5.centos, heartbeat-devel-2.1.3-3.el5.centos. The 
> config-files are included in the mail. The ha-debug log file gives these 
> errors:
> 
> heartbeat[28951]: 2008/07/24_11:14:30 WARN: Late heartbeat: Node loadi: 
> interval 5320 ms
> 
> heartbeat[28951]: 2008/07/24_11:14:45 WARN: Late heartbeat: Node loadi: 
> interval 5300 ms
> 
> heartbeat[28951]: 2008/07/24_11:14:50 WARN: Late heartbeat: Node loadi: 
> interval 5200 ms
> 
> heartbeat[28951]: 2008/07/24_11:15:07 WARN: Late heartbeat: Node loadi: 
> interval 7000 ms
> 
> heartbeat[28951]: 2008/07/24_11:15:12 WARN: Late heartbeat: Node loadi: 
> interval 5010 ms
> 
> heartbeat[28951]: 2008/07/24_11:15:17 WARN: Late heartbeat: Node loadi: 
> interval 5250 ms
> 
> heartbeat[28951]: 2008/07/24_11:15:22 WARN: Late heartbeat: Node loadi: 
> interval 5260 ms

The heartbeat communication processes are starved. If you're
nodes are not severely loaded, which I doubt, then it's probably
some vmware/linux kernel issue. Note that heartbeat communication
processes are locked in memory and run at the highest priority.

> Has anyone had the same problem or does anyone know how to overcome this 
> problem? Thank you in advance!!! Regards,
> 
> Ralf Bardoel
> 

> checktimeout=15
> checkinterval=5
> autoreload=yes
> logfile="local0"
> logfile="/var/log/ldirectord.log"
> quiescent=no
> virtual = xxx.xxx.xxx.xxx:80
> real = xxx.xxx.xxx.xxx:80 gate
> real = xxx.xxx.xxx.xxx:80 gate
> checktype = negotiate
> service=http
> protocol = tcp
> request="ldirector.html"
> receive="Test Page"
> scheduler=wrr
> checktype=negotiate
> 

> [Thu Jul 24 08:48:01 2008|ldirectord.cf|28521] ldirectord for \
> /etc/ha.d/ldirectord.cf is running with pid: 26969 [Thu Jul 24 08:48:01 \
> 2008|ldirectord.cf|28521] Exiting from ldirectord status [Thu Jul 24 08:48:02 \
> 2008|ldirectord.cf|28602] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 08:48:02 \
> 2008|ldirectord.cf|28602] ldirectord for /etc/ha.d/ldirectord.cf is running with \
> pid: 26969 [Thu Jul 24 08:48:02 2008|ldirectord.cf|28602] Exiting from ldirectord \
> status [Thu Jul 24 08:48:02 2008|ldirectord.cf|28624] Invoking ldirectord invoked \
> as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start [Thu Jul 24 08:48:05 \
> 2008|ldirectord.cf|28944] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf stop [Thu Jul 24 08:48:05 \
> 2008|ldirectord.cf|26969] Purged real server (stop): xxx.xxx.xxx.xxx:80 \
> (2xxx.xxx.xxx.xxx:80) [Thu Jul 24 08:48:05 2008|ldirectord.cf|26969] Purged virtual \
> server (stop): xxx.xxx.xxx.xxx:80 [Thu Jul 24 08:48:05 2008|ldirectord.cf|26969] \
> Linux Director Daemon terminated on signal: TERM [Thu Jul 24 08:48:24 \
> 2008|ldirectord.cf|29030] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 08:48:24 \
> 2008|ldirectord.cf|29030] Exiting with exit_status 3: Exiting from ldirectord \
> status [Thu Jul 24 08:48:24 2008|ldirectord.cf|29052] Invoking ldirectord invoked \
> as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start [Thu Jul 24 08:48:24 \
> 2008|ldirectord.cf|29052] Starting Linux Director v1.186-ha-2.1.3 as daemon [Thu \
> Jul 24 08:48:24 2008|ldirectord.cf|29054] Added virtual server: xxx.xxx.xxx.xxx:80 \
> [Thu Jul 24 08:48:25 2008|ldirectord.cf|29054] Added real server: \
> xxx.xxx.xxx.xxx:80 (xxx.xxx.xxx.xxx:80) (Weight set to 1) [Thu Jul 24 11:17:15 \
> 2008|ldirectord.cf|1144] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 11:17:15 \
> 2008|ldirectord.cf|1144] ldirectord for /etc/ha.d/ldirectord.cf is running with \
> pid: 29054 [Thu Jul 24 11:17:15 2008|ldirectord.cf|1144] Exiting from ldirectord \
> status [Thu Jul 24 11:17:15 2008|ldirectord.cf|1235] Invoking ldirectord invoked \
> as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 11:17:15 \
> 2008|ldirectord.cf|1235] ldirectord for /etc/ha.d/ldirectord.cf is running with \
> pid: 29054 [Thu Jul 24 11:17:15 2008|ldirectord.cf|1235] Exiting from ldirectord \
> status [Thu Jul 24 11:17:16 2008|ldirectord.cf|1257] Invoking ldirectord invoked \
> as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start [Thu Jul 24 11:17:18 \
> 2008|ldirectord.cf|1596] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf stop [Thu Jul 24 11:17:18 \
> 2008|ldirectord.cf|29054] Purged real server (stop): xxx.xxx.xxx.xxx:80 \
> (xxx.xxx.xxx.xxx:80) [Thu Jul 24 11:17:18 2008|ldirectord.cf|29054] Purged virtual \
> server (stop): xxx.xxx.xxx.xxx:80 [Thu Jul 24 11:17:18 2008|ldirectord.cf|29054] \
> Linux Director Daemon terminated on signal: TERM [Thu Jul 24 11:17:46 \
> 2008|ldirectord.cf|1682] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 11:17:46 \
> 2008|ldirectord.cf|1682] Exiting with exit_status 3: Exiting from ldirectord status \
> [Thu Jul 24 11:17:47 2008|ldirectord.cf|1704] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf start [Thu Jul 24 11:17:47 \
> 2008|ldirectord.cf|1704] Starting Linux Director v1.186-ha-2.1.3 as daemon [Thu Jul \
> 24 11:17:47 2008|ldirectord.cf|1706] Added virtual server: xxx.xxx.xxx.xxx:80 [Thu \
> Jul 24 11:17:47 2008|ldirectord.cf|1706] Added real server: xxx.xxx.xxx.xxx:80 \
> (xxx.xxx.xxx.xxx:80) (Weight set to 1) [Thu Jul 24 11:17:49 \
> 2008|ldirectord.cf|2083] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 11:17:49 \
> 2008|ldirectord.cf|2083] ldirectord for /etc/ha.d/ldirectord.cf is running with \
> pid: 1706 [Thu Jul 24 11:17:49 2008|ldirectord.cf|2083] Exiting from ldirectord \
> status [Thu Jul 24 11:17:50 2008|ldirectord.cf|2148] Invoking ldirectord invoked \
> as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Jul 24 11:17:50 \
> 2008|ldirectord.cf|2148] ldirectord for /etc/ha.d/ldirectord.cf is running with \
> pid: 1706 [Thu Jul 24 11:17:50 2008|ldirectord.cf|2148] Exiting from ldirectord \
> status [Thu Jul 24 11:17:51 2008|ldirectord.cf|2170] Invoking ldirectord invoked \
> as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start [Thu Jul 24 11:17:56 \
> 2008|ldirectord.cf|2512] Invoking ldirectord invoked as: \
> /etc/ha.d/resource.d/ldirectord ldirectord.cf stop [Thu Jul 24 11:17:56 \
> 2008|ldirectord.cf|1706] Purged real server (stop): xxx.xxx.xxx.xxx:80 \
> (xxx.xxx.xxx.xxx:80) [Thu Jul 24 11:17:56 2008|ldirectord.cf|1706] Purged virtual \
> server (stop): xxx.xxx.xxx.xxx:80 [Thu Jul 24 11:17:56 2008|ldirectord.cf|1706] \
> Linux Director Daemon terminated on signal: TERM 

> respawn hacluster /usr/lib/heartbeat/ipfail
> auto_failback on
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     local0
> keepalive       5
> deadtime        11

The keepalive and deadtime settings are too close. Furthermore, I
think that the keepalive should be a bit lower.

Thanks,

Dejan

> udpport 694
> udp     eth0
> node    loadI
> node    loadII
> 

> loadI      \
> ldirectord::ldirectord.cf \
> LVSSyncDaemonSwap::master \
> IPaddr2::xxx.xxx.xxx.xxx/24/eth0/xxx.xxx.xxx.xxx
> 
> 

> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic