[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    RE: [Linux-ha-dev] Patch: Fast node fail detection (part 2)
From:       "Zou, Yixiong" <yixiong.zou () intel ! com>
Date:       2005-02-11 20:23:20
Message-ID: 012676D607FCF54E986746512C22CE7D02E299F0 () orsmsx407
[Download RAW message or body]

[Attachment #2 (unknown)]

I did two tests using the "faildetection" utility.  Results are posted here.  The \
first set  of 10 tests used "nodefail", the second did not.  

------------------------------------------------------------------------------------------------


This is the result for the first test.  I run "faildetection 10", which tells it to \
run 10 times.  

Here's the settings in the /etc/ha.d/ha.cf

	keepalive 50ms
	deadtime  500ms
	warntime  250ms

[root@coldplay nodefail]# ./faildetection 10
lt-faildetection[2534]: 2005/02/11_11:49:04 info: current pid = 2534, test count = 10
lt-faildetection[2534]: 2005/02/11_11:49:04 debug: Signing in with heartbeat
lt-faildetection[2534]: 2005/02/11_11:49:04 info: myid = coldplay, ntfid = caliber
lt-faildetection[2534]: 2005/02/11_11:49:04 info: wait two seconds before we start \
the test. Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:49:46 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:49:47 info: failure \
detection time for test No.    1: 80ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:50:47 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:50:48 info: failure \
detection time for test No.    2: 70ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:51:48 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:51:49 info: failure \
detection time for test No.    3: 90ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:52:49 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:52:50 info: failure \
detection time for test No.    4: 70ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:53:50 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:53:50 info: failure \
detection time for test No.    5: 70ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:54:51 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:54:51 info: failure \
detection time for test No.    6: 100ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:55:52 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:55:52 info: failure \
detection time for test No.    7: 70ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:56:53 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:56:53 info: failure \
detection time for test No.    8: 90ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:57:54 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:57:54 info: failure \
detection time for test No.    9: 90ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[2534]: 2005/02/11_11:58:55 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat; snmptrap -v 2c -c public  coldplay  '' \
.1.3.6.1.4.1.4682.900.1" lt-faildetection[2534]: 2005/02/11_11:58:55 info: failure \
detection time for test No.   10: 100ms

lt-faildetection[2534]: 2005/02/11_11:58:55 info: average failover detection time = \
83, min = 70, max = 100, number of tests fall within range: 10, within requirment: 10


-------------------------------------------------------------------------------------- \
-------------------------------------------------------------------------------------

Below is the second test result.  I run "faildetection -nt 10".  This tells \
faildetection to not to send the trap command, thus the number we get is the original \
heartbeat detection ttime.  I also changed the ha.cf settings to the following: 

	keepalive 25ms
	deadtime  250ms
	warntime  125ms

[root@coldplay nodefail]# ./faildetection -nt 10
lt-faildetection[12433]: 2005/02/11_12:04:04 info: current pid = 12433, test count = \
10 lt-faildetection[12433]: 2005/02/11_12:04:04 debug: Signing in with heartbeat
lt-faildetection[12433]: 2005/02/11_12:04:04 info: myid = coldplay, ntfid = caliber
lt-faildetection[12433]: 2005/02/11_12:04:04 info: wait two seconds before we start \
the test. Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:04:47 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:04:47 info: \
failure detection time for test No.    1: 230ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:05:48 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:05:48 info: \
failure detection time for test No.    2: 420ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:06:49 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:06:50 info: \
failure detection time for test No.    3: 450ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:07:50 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:07:51 info: \
failure detection time for test No.    4: 350ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:08:51 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:08:52 info: \
failure detection time for test No.    5: 350ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:09:52 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:09:53 info: \
failure detection time for test No.    6: 370ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:10:53 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:10:54 info: \
failure detection time for test No.    7: 450ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:11:54 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:11:55 info: \
failure detection time for test No.    8: 420ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:12:55 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:12:56 info: \
failure detection time for test No.    9: 360ms

Starting High-Availability services:
[  OK  ]
lt-faildetection[12433]: 2005/02/11_12:13:57 info: exec: ssh -q -x -n -l root \
"caliber" "killall -9 heartbeat" lt-faildetection[12433]: 2005/02/11_12:13:57 info: \
failure detection time for test No.   10: 470ms

lt-faildetection[12433]: 2005/02/11_12:13:57 info: average failover detection time = \
387, min = 230, max = 470, number of tests fall within range: 1, within requirment: 1


-----------------------------------

Yixiong Zou (yixiong.zou@intel.com)
Open Source Technology Center
Intel Corp.

All views expressed in this email are those of the individual sender.

   



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic