[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] CTS Result of Aug 23
From:       Alan Robertson <alanr () unix ! sh>
Date:       2005-08-24 1:19:11
Message-ID: 430BCB0F.5000604 () unix ! sh
[Download RAW message or body]

Huang Zhen wrote:
> The source code was in Aug 23.
> 
> command:
> /usr/lib/heartbeat/cts/CTSlab.py -2 -D /tmp/cts/ -L /var/log/messages123 
> -c -r --fencing 1 500
> 
> result:
> Aug 24 03:08:53    ****************
> Aug 24 03:08:53    Overall Results:{'failure': 0, 'success': 500, 
> 'BadNews': 82}
> Aug 24 03:08:53    ****************
> Aug 24 03:08:53    Detailed Results
> Aug 24 03:08:53    Test Flip:     {'elapsed_time': 1629.1062302589417, 
> 'skipped': 0, 'calls': 46, 'success': 46, 'started': 13, 'down->up': 13, 
> 'auditfail': 0, 'failure': 0, 'stopped': 33, 'max_time': 
> 48.975219011306763, 'min_time': 5.7235660552978516, 'up->down': 33}
> Aug 24 03:08:53    Test Restart:     {'elapsed_time': 976.2874219417572, 
> 'skipped': 0, 'calls': 47, 'success': 47, 'WasStopped': 31, 
> 'node:hadev3': 19, 'node:hadev2': 15, 'node:hadev1': 13, 'auditfail': 0, 
> 'failure': 0, 'max_time': 39.761883974075317, 'min_time': 
> 6.8436539173126221}
> Aug 24 03:08:53    Test Stonithd:     {'elapsed_time': 
> 5068.9160165786743, 'skipped': 0, 'calls': 43, 'success': 43, 
> 'auditfail': 0, 'failure': 0, 'max_time': 188.40486288070679, 
> 'min_time': 70.06779408454895}
> Aug 24 03:08:53    Test StartOnebyOne:     {'elapsed_time': 
> 2044.5414960384369, 'skipped': 0, 'calls': 30, 'success': 30, 
> 'auditfail': 0, 'failure': 0, 'max_time': 73.213633060455322, 
> 'min_time': 51.823998928070068}
> Aug 24 03:08:53    Test SimulStart:     {'elapsed_time': 
> 1565.1155052185059, 'skipped': 0, 'calls': 40, 'success': 40, 
> 'auditfail': 0, 'failure': 0, 'max_time': 53.108457088470459, 
> 'min_time': 21.958048820495605}
> Aug 24 03:08:53    Test SimulStop:     {'elapsed_time': 
> 900.88228130340576, 'skipped': 0, 'calls': 41, 'success': 41, 
> 'auditfail': 0, 'failure': 0, 'max_time': 46.81151294708252, 'min_time': 
> 14.325378179550171}
> Aug 24 03:08:53    Test StopOnebyOne:     {'elapsed_time': 
> 849.31568813323975, 'skipped': 0, 'calls': 24, 'success': 24, 
> 'auditfail': 0, 'failure': 0, 'max_time': 46.604828119277954, 
> 'min_time': 15.642576932907104}
> Aug 24 03:08:53    Test RestartOnebyOne:     {'elapsed_time': 
> 3315.1372413635254, 'skipped': 0, 'calls': 47, 'success': 47, 
> 'auditfail': 0, 'failure': 0, 'max_time': 382.18086814880371, 
> 'min_time': 37.297457933425903}
> Aug 24 03:08:53    Test standby2:     {'elapsed_time': 
> 1916.8006129264832, 'skipped': 0, 'calls': 27, 'success': 27, 
> 'auditfail': 0, 'failure': 0, 'max_time': 97.094635009765625, 
> 'min_time': 65.486258983612061}
> Aug 24 03:08:53    Test Bandwidth:     {'elapsed_time': 
> 669.27408027648926, 'skipped': 6, 'calls': 39, 'success': 33, 'min': 
> 12360.781916964643, 'max': 14242.352486818427, 'totalbandwidth': 
> 431098.1668085191, 'auditfail': 0, 'failure': 0, 'max_time': 
> 40.663763999938965, 'min_time': 7.2956085205078125e-05}
> Aug 24 03:08:53    Test ResourceRecover:     {'elapsed_time': 
> 719.37023115158081, 'skipped': 0, 'calls': 35, 'success': 35, 
> 'auditfail': 0, 'failure': 0, 'max_time': 85.048264026641846, 
> 'min_time': 12.041023015975952}
> Aug 24 03:08:53    Test SpecialTest1:     {'elapsed_time': 
> 2675.4830515384674, 'skipped': 0, 'calls': 42, 'success': 0, 
> 'auditfail': 0, 'failure': 0, 'max_time': 122.34544396400452, 
> 'min_time': 39.243659973144531}
> Aug 24 03:08:53    Test NearQuorumPoint:     {'elapsed_time': 
> 752.15080618858337, 'skipped': 4, 'calls': 39, 'success': 35, 
> 'auditfail': 0, 'failure': 0, 'max_time': 69.302345991134644, 
> 'min_time': 0.00030303001403808594}
> Aug 24 03:08:53    <<<<<<<<<<<<<<<< TESTS COMPLETED
> 
> The BadNews:
> 1. stonithd test.
> All stonithd tests have bad news.
> "
> Aug 23 20:56:03    Running test Stonithd (hadev3)     [74]
> Aug 23 20:57:35    BadNews: Aug 23 20:57:28 hadev2 tengine: [26544]: 
> ERROR: mask(utils.c:send_complete): 0 - Transition status: Timed out 
> after 60000ms
> Aug 23 20:57:35    BadNews: Aug 23 20:57:28 hadev2 crmd: [26006]: ERROR: 
> mask(messages.c:handle_request): Filtering te_timeout op in state 
> S_ELECTION
> "
> Some of them only have the timeout one.
> 
> 2. 7 SpecialTest1 tests in total 42 runs has bad news. Typical bad news 
> are:
> Aug 23 20:44:16    Running test SpecialTest1 (hadev2)     [62]
> Aug 23 20:46:19    BadNews: Aug 23 20:45:15 hadev3 crmd: [14527]: ERROR: 
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action monitor 
> on DoFencing:child_DoFencing:1 in state S_PENDING
> Aug 23 20:46:19    BadNews: Aug 23 20:45:15 hadev3 crmd: [14527]: ERROR: 
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action monitor 
> on rsc_hadev1 in state S_PENDING
> Aug 23 20:46:19    BadNews: Aug 23 20:45:15 hadev3 crmd: [14527]: ERROR: 
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action monitor 
> on rsc_hadev3 in state S_PENDING
> Aug 23 20:46:19    BadNews: Aug 23 20:46:16 hadev2 tengine: [20020]: 
> ERROR: mask(utils.c:timer_callback): Transition abort timeout reached... 
> marking transition complete.
> Aug 23 20:46:19    BadNews: Aug 23 20:46:16 hadev2 tengine: [20020]: 
> ERROR: mask(utils.c:send_complete): 1 - Transition status: Abort timed 
> out after 60000ms
> 
> 3. IPaddr. It looks some thing wrong at the test node. But after a 
> while, it did no long appear.
> Aug 23 19:54:15    Running test Bandwidth (hadev3)     [3]
> Aug 23 19:54:41    ...bandwidth: 12487 bits/sec
> Aug 23 19:54:42    BadNews: Aug 23 19:54:21 hadev1 send_arp: [29745]: 
> ERROR: libnet_build_ethernet failed:
> Aug 23 19:54:42    BadNews: Aug 23 19:54:21 hadev1 IPaddr[29690]: 
> [29751]: ERROR: Could not send gratuitous arps. rc=1
> Aug 23 19:54:49    Running test RestartOnebyOne (hadev2)     [4]
> 
> 4. ResourceRecover, the bad news are same as specialTest1
> Aug 23 21:46:26    Running test ResourceRecover (hadev1)     [129]
> Aug 23 21:47:36    ...resource IPaddr::rsc_hadev3 on hadev3
> Aug 23 21:47:51    BadNews: Aug 23 21:46:32 hadev2 crmd: [15446]: ERROR: 
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action start on 
> rsc_hadev1 in state S_PENDING
> Aug 23 21:47:51    BadNews: Aug 23 21:46:32 hadev2 crmd: [15446]: ERROR: 
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action start on 
> rsc_hadev2 in state S_PENDING
> Aug 23 21:47:51    BadNews: Aug 23 21:46:32 hadev2 crmd: [15446]: ERROR: 
> mask(lrm.c:do_lrm_rsc_op): Discarding attempt to perform action start on 
> DoFencing:child_DoFencing:0 in state S_PENDING
> Aug 23 21:47:52    BadNews: Aug 23 21:47:33 hadev3 tengine: [22385]: 
> ERROR: mask(utils.c:timer_callback): Transition abort timeout reached... 
> marking transition complete.
> Aug 23 21:47:52    BadNews: Aug 23 21:47:34 hadev3 tengine: [22385]: 
> ERROR: mask(utils.c:send_complete): 2 - Transition status: Abort timed 
> out after 60000ms


OK.

As I mentioned earlier, I will not longer tolerate ANY of these in the 
results.  The code freeze will remain on and 2.0.1 will be delayed until 
all of these are fixed in everyone's environment.

-- 
     Alan Robertson <alanr@unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic