'RE: [Linux-ha-dev] Faster Node'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    RE: [Linux-ha-dev] Faster Node
From:       "Zou, Yixiong" <yixiong.zou () intel ! com>
Date:       2004-08-25 21:42:08
Message-ID: 012676D607FCF54E986746512C22CE7D016E2432 () orsmsx407
[Download RAW message or body]

> -----Original Message-----
> From: linux-ha-dev-bounces@lists.linux-ha.org 
> [mailto:linux-ha-dev-bounces@lists.linux-ha.org] On Behalf Of 
> Alan Robertson
> Sent: Wednesday, August 25, 2004 12:32 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] Faster Node Failuredetection 
> forTelco hardwareplatform
> 
> 
> Zou, Yixiong wrote:
> > I don't think the reliability of UDP packets is a big issue 
> here.  The
> > worst will be that we do not receive any notification about 
> node death
> > and we do normal timeout like what we do now.  
> > 
> > SNMPv3 might be complex to setup.  But that's only a 
> one-time cost.  And
> > if it works, it's worth it. 
> > 
> > I don't know much about CIM.  I imagine that you would need a CIM
> > provider, which is equivlent towards the SNMP server.  Do 
> you know if 
> > these CIM provider currently exists for the blade center or 
> ATCA platforms?  
> > If not, how much is involved to set it up?  Basically, it 
> is a question
> > of "are we there yet"?  
> > 
> > The good thing about the SNMP trap handler is that 
> everything is there
> > already.  
> 
> The issue is simply this:
> 
> Whatever the firmware already does, we can take advantage of. 
>  And, we 
> should choose the "best" option for any given piece of firmware.
> 
> Whatever the firmware does not already do (and is not 
> planning on doing), 
> is very very difficult to do, because it's hard and slow to 
> get people to 
> change firmware.
> 
> I suspect that SNMP is probably the best place for us to 
> start - because it 
> will handle many more cases than CIM (today).
> 
> 

Agreed.  CIM would be better if it is available.  Unfortunately
it is not there yet.  Using HPI probably wouldn't gain us much
either.  

Back to the design of the handler, you mentioned that it could be
done by a communication plugin.  Could you tell me more how this
works?  Seems to me that you have to load a communication plugin
as part of the heartbeat.  

The way I image it being just like ipfail, when a trap event is 
received, it will be invoked by the SNMP trap daemon.  And this 
program will signon to the heartbeat daemon and notify heartbeat that a 
node is dead.  What ipfail did essentially is sent out a "askresource" 
message.  So if we can have something like that in the heartbeat, 
everything should work out.  What do you think of that? 

------------------------------------------------------------------------

Yixiong Zou (yixiong.zou@intel.com)

(626) 443-0100

All views expressed in this email are those of the individual sender. 

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread]