[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-ha-dev
Subject: Re: [Linux-ha-dev] CTS Result of Aug 23
From: Alan Robertson <alanr () unix ! sh>
Date: 2005-08-24 1:19:11
Message-ID: 430BCB0F.5000604 () unix ! sh
[Download RAW message or body]
Huang Zhen wrote:
> The source code was in Aug 23.
>
> command:
> /usr/lib/heartbeat/cts/CTSlab.py -2 -D /tmp/cts/ -L /var/log/messages123
> -c -r --fencing 1 500
>
> result:
> Aug 24 03:08:53 ****************
> Aug 24 03:08:53 Overall Results:{'failure': 0, 'success': 500,
> 'BadNews': 82}
> Aug 24 03:08:53 ****************
> Aug 24 03:08:53 Detailed Results
> Aug 24 03:08:53 Test Flip: {'elapsed_time': 1629.1062302589417,
> 'skipped': 0, 'calls': 46, 'success': 46, 'started': 13, 'down->up': 13,
> 'auditfail': 0, 'failure': 0, 'stopped': 33, 'max_time':
> 48.975219011306763, 'min_time': 5.7235660552978516, 'up->down': 33}
> Aug 24 03:08:53 Test Restart: {'elapsed_time': 976.2874219417572,
> 'skipped': 0, 'calls': 47, 'success': 47, 'WasStopped': 31,
> 'node:hadev3': 19, 'node:hadev2': 15, 'node:hadev1': 13, 'auditfail': 0,
> 'failure': 0, 'max_time': 39.761883974075317, 'min_time':
> 6.8436539173126221}
> Aug 24 03:08:53 Test Stonithd: {'elapsed_time':
> 5068.9160165786743, 'skipped': 0, 'calls': 43, 'success': 43,
> 'auditfail': 0, 'failure': 0, 'max_time': 188.40486288070679,
> 'min_time': 70.06779408454895}
> Aug 24 03:08:53 Test StartOnebyOne: {'elapsed_time':
> 2044.5414960384369, 'skipped': 0, 'calls': 30, 'success': 30,
> 'auditfail': 0, 'failure': 0, 'max_time': 73.213633060455322,
> 'min_time': 51.823998928070068}
> Aug 24 03:08:53 Test SimulStart: {'elapsed_time':
> 1565.1155052185059, 'skipped': 0, 'calls': 40, 'success': 40,
> 'auditfail': 0, 'failure': 0, 'max_time': 53.108457088470459,
> 'min_time': 21.958048820495605}
> Aug 24 03:08:53 Test SimulStop: {'elapsed_time':
> 900.88228130340576, 'skipped': 0, 'calls': 41, 'success': 41,
> 'auditfail': 0, 'failure': 0, 'max_time': 46.81151294708252, 'min_time':
> 14.325378179550171}
> Aug 24 03:08:53 Test StopOnebyOne: {'elapsed_time':
> 849.31568813323975, 'skipped': 0, 'calls': 24, 'success': 24,
> 'auditfail': 0, 'failure': 0, 'max_time': 46.604828119277954,
> 'min_time': 15.642576932907104}
> Aug 24 03:08:53 Test RestartOnebyOne: {'elapsed_time':
> 3315.1372413635254, 'skipped': 0, 'calls': 47, 'success': 47,
> 'auditfail': 0, 'failure': 0, 'max_time': 382.18086814880371,
> 'min_time': 37.297457933425903}
> Aug 24 03:08:53 Test standby2: {'elapsed_time':
> 1916.8006129264832, 'skipped': 0, 'calls': 27, 'success': 27,
> 'auditfail': 0, 'failure': 0, 'max_time': 97.094635009765625,
> 'min_time': 65.486258983612061}
> Aug 24 03:08:53 Test Bandwidth: {'elapsed_time':
> 669.27408027648926, 'skipped': 6, 'calls': 39, 'success': 33, 'min':
> 12360.781916964643, 'max': 14242.352486818427, 'totalbandwidth':
> 431098.1668085191, 'auditfail': 0, 'failure': 0, 'max_time':
> 40.663763999938965, 'min_time': 7.2956085205078125e-05}
> Aug 24 03:08:53 Test ResourceRecover: {'elapsed_time':
> 719.37023115158081, 'skipped': 0, 'calls': 35, 'success': 35,
> 'auditfail': 0, 'failure': 0, 'max_time': 85.048264026641846,
> 'min_time': 12.041023015975952}
> Aug 24 03:08:53 Test SpecialTest1: {'elapsed_time':
> 2675.4830515384674, 'skipped': 0, 'calls': 42, 'success': 0,
> 'auditfail': 0, 'failure': 0, 'max_time': 122.34544396400452,
> 'min_time': 39.243659973144531}
> Aug 24 03:08:53 Test NearQuorumPoint: {'elapsed_time':
> 752.15080618858337, 'skipped': 4, 'calls': 39, 'success': 35,
> 'auditfail': 0, 'failure': 0, 'max_time': 69.302345991134644,
> 'min_time': 0.00030303001403808594}
> Aug 24 03:08:53 <<<<<<<<<<<<<<<< TESTS COMPLETED
>
> The BadNews:
> 1. stonithd test.
> All stonithd tests have bad news.
> "
> Aug 23 20:56:03 Running test Stonithd (hadev3) [74]
> Aug 23 20:57:35 BadNews: Aug 23 20:57:28 hadev2 tengine: [26544]:
> ERROR: mask(utils.c:send_complete): 0 - Transition status: Timed out
> after 60000ms
> Aug 23 20:57:35 BadNews: Aug 23 20:57:28 hadev2 crmd: [26006]: ERROR:
> mask(messages.c:handle_request): Filtering te_timeout op in state
> S_ELECTION
> "
> Some of them only have the timeout one.
>
> 2. 7 SpecialTest1 tests in total 42 runs has bad news. Typical bad news
> are:
> Aug 23 20:44:16 Running test SpecialTest1 (hadev2) [62]
> Aug 23 20:46:19 BadNews: Aug 23 20:45:15 hadev3 crmd: [14527]: ERROR:
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action monitor
> on DoFencing:child_DoFencing:1 in state S_PENDING
> Aug 23 20:46:19 BadNews: Aug 23 20:45:15 hadev3 crmd: [14527]: ERROR:
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action monitor
> on rsc_hadev1 in state S_PENDING
> Aug 23 20:46:19 BadNews: Aug 23 20:45:15 hadev3 crmd: [14527]: ERROR:
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action monitor
> on rsc_hadev3 in state S_PENDING
> Aug 23 20:46:19 BadNews: Aug 23 20:46:16 hadev2 tengine: [20020]:
> ERROR: mask(utils.c:timer_callback): Transition abort timeout reached...
> marking transition complete.
> Aug 23 20:46:19 BadNews: Aug 23 20:46:16 hadev2 tengine: [20020]:
> ERROR: mask(utils.c:send_complete): 1 - Transition status: Abort timed
> out after 60000ms
>
> 3. IPaddr. It looks some thing wrong at the test node. But after a
> while, it did no long appear.
> Aug 23 19:54:15 Running test Bandwidth (hadev3) [3]
> Aug 23 19:54:41 ...bandwidth: 12487 bits/sec
> Aug 23 19:54:42 BadNews: Aug 23 19:54:21 hadev1 send_arp: [29745]:
> ERROR: libnet_build_ethernet failed:
> Aug 23 19:54:42 BadNews: Aug 23 19:54:21 hadev1 IPaddr[29690]:
> [29751]: ERROR: Could not send gratuitous arps. rc=1
> Aug 23 19:54:49 Running test RestartOnebyOne (hadev2) [4]
>
> 4. ResourceRecover, the bad news are same as specialTest1
> Aug 23 21:46:26 Running test ResourceRecover (hadev1) [129]
> Aug 23 21:47:36 ...resource IPaddr::rsc_hadev3 on hadev3
> Aug 23 21:47:51 BadNews: Aug 23 21:46:32 hadev2 crmd: [15446]: ERROR:
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action start on
> rsc_hadev1 in state S_PENDING
> Aug 23 21:47:51 BadNews: Aug 23 21:46:32 hadev2 crmd: [15446]: ERROR:
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action start on
> rsc_hadev2 in state S_PENDING
> Aug 23 21:47:51 BadNews: Aug 23 21:46:32 hadev2 crmd: [15446]: ERROR:
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action start on
> DoFencing:child_DoFencing:0 in state S_PENDING
> Aug 23 21:47:52 BadNews: Aug 23 21:47:33 hadev3 tengine: [22385]:
> ERROR: mask(utils.c:timer_callback): Transition abort timeout reached...
> marking transition complete.
> Aug 23 21:47:52 BadNews: Aug 23 21:47:34 hadev3 tengine: [22385]:
> ERROR: mask(utils.c:send_complete): 2 - Transition status: Abort timed
> out after 60000ms
OK.
As I mentioned earlier, I will not longer tolerate ANY of these in the
results. The code freeze will remain on and 2.0.1 will be delayed until
all of these are fixed in everyone's environment.
--
Alan Robertson <alanr@unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic