[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    [Linux-ha-dev] What heartbeat uses clocks for
From:       Alan Robertson <alanr () unix ! sh>
Date:       2001-11-07 20:27:45
[Download RAW message or body]

I was out of the discussion for a while.  From what I can see it's gone a
bit overboard compared to the needs of membership at least ;-)

Heartbeat only uses clocks for basically  three things:

	How long it has been since we received a heartbeat from machine "m"

	How long it has been since we sent out a heartbeat

	Alarm calls to interrupt calls which might hang


Each of these only need a clock which is short term kind-a/sorta close
	to actual time.  20% error is typically not a problem.  No reference at
	all is made to real time of day between machines.

BUT -- it is important that this error be bounded to this amount in the
presence of changing the system time/date.  In other words, we need to be
able to measure how many ms it has been since some event in the past - even
if the system time is changed.

This is a pretty uncritical use of system clocks - certainly we can tolerate
horrible, gross, vile, errors in clocks run rates - far beyond that of even
very bad PC motherboards.

The discussion was started by the problem from Matt Soffen that changing the
time locally on *BSD systems will cause a **potentially unbounded error** to
short term time measurements given my understanding of the system clocks on
*BSD.

This is fatal to heartbeat, and probably other software as well.

There are (at least) three ways to deal with this:

	1) Find a *BSD clock that works like the return result from times(3) on
		Linux or Solaris (i.e., it's jump-free)

	2) write a layer of software over the times() calls to fudge the results
		to hide clock jumps

	3) Use ntp to keep the clocks in sync jump-free

I believe that these are listed in order of preference for solving this
particular problem.

This does not mean that you don't want to run NTP on the cluster, just that
you want to work correctly if the client cluster isn't running NTP (options
1 and 2).


	-- Alan Robertson
	   alanr@unix.sh
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.community.tummy.com
http://lists.community.tummy.com/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic