[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-ha-dev
Subject: Re: [Linux-ha-dev] Measuring time interval required for failover
From: Lars Marowsky-Bree <lmb () suse ! de>
Date: 2007-01-07 23:15:52
Message-ID: 20070107231552.GB13938 () marowsky-bree ! de
[Download RAW message or body]
On 2007-01-05T10:10:23, Peter Wong <peter.wong@mobidia.com> wrote:
Hi Peter,
> I'm looking for ways of measuring the average time it
> takes for the system to failover from the active node
> to the standby node.
>
> I have been asked to have the system fail over from
> node A to node B and then after node B runs for a
> while the system would fail over from node B to node A.
> This flip-flop scenario would be carried out for say
> 1000 times.
Well, you need to cause a real failure to measure something relevant -
for example, halt -nf on the node you want to kill, measure this time
until the service is available on the new node again.
This will measure the entire time for your specific environment:
heartbeat detection time, STONITH latency, service recovery time et
cetera.
Of course, if you want to measure the time for a graceful switch-over,
initiated by the admin, this can also be done. Simply call, say,
crm_resource -M and measure the time until the service is available
again.
1000 is quite often; you'll likely be able to get useful data for n=10
already. I doubt that the timings will have a large deviation. (I may be
wrong, though.)
> I have the following questions regarding this scenario:
>
> 1. Has anyone done this sort of measurements before?
Not that I'm aware of. It'd be good to have something like this,
though.
> 2. Can Heartbeat handle this flip-flopping of >1000
> times between the nodes?
We should hope so!
> 3. Are there any scripts/code within the Heartbeat
> package that would assist in this situation?
Well, we have the tools to perform a switch-over (using crm_resource to
migrate the service). CTS also does things like this.
> 4. What is the correct way of measuring this time
> interval, between one node becomes non-operational
> and the other node becomes active?
See above.
> 5. In the log files produced by Heartbeat (ha-debug,
> ha-log), the time stamps have resolution in seconds.
> Is it possible to get a finer resolution, say
> milliseconds?
syslog-ng may be able to give you higher resolution. However, you'll be
performing operations which easily take several seconds to complete; I
doubt this will actually help you gain more relevant/better data.
Sincerely,
Lars
--
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge."
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic