[prev in list] [next in list] [prev in thread] [next in thread] 

List:       veritas-ha
Subject:    Re: [Veritas-ha] IPMultiNICB, mpathd and network outages
From:       Jason Fortezzo <fortezza () mechanicalism ! net>
Date:       2008-10-21 0:28:18
Message-ID: 20081021002818.GA16153 () alderaan ! mechanicalism ! net
[Download RAW message or body]

On Mon, Oct 20, 2008 at 10:37:08AM -0400, Paul Robertson wrote:
> We recently experienced a Cisco network issue which prevented all
> nodes in that subnet from accessing the default gateway for about a
> minute.
> 
> The Solaris nodes which run probe-based IPMP reported that all
> interfaces had failed because they were unable to ping the default
> gateway; however, they came back within seconds once the network issue
> was resolved. Fine.
> 
> Unfortunately, our VCS nodes initiated an offline of the service group
> after the IPMultiNICB resources detected the IPMP fault. Since the
> service group offline/online takes several minutes, the outage on
> these nodes was more painful. Furthermore, since the peer cluster
> nodes in the same subnet were also experiencing the same mpathd fault,
> there would have been little advantage to failing over the service
> group to another node.
> 
> We would like to find a way to configure VCS so that the service group
> does not offline (and any dependent resources within the service group
> are not offlined) in the event of an mpathd (i.e. IPMultiNICB)
> failure. In looking through the documentation, it seems that the
> closest we can come is to increase the IPMultiNICB ToleranceLimit from
> "1" to a huge value:

I've been bitten by this before and found the problem was caused by
spanning tree re-calcs.  The way I got around it was to disable
probe-based fault detection and use link-based detection.  Whilst
probe-based detection monitors both L2 and L3 connectivity, we found it
to be too fragile and were willing to assume the risk of only monitoring
L2.  Solaris 10 IPMP natively supports link-based detection, but
unfortunately with Solaris 8 & 9, you have to disable IPMP altogether
and rely on the MultiNICB agent.


Solaris 8+9:

# main.cf:
MultiNICB multinicb (
    UseMpathd = 0 
    LinkTestRatio = 0
    IgnoreLinkStatus = 0
    Device @server1 = { ce0 = 0, ce4 = 1 }
    Device @server2 = { ce0 = 0, ce4 = 1 }
)

# /etc/hostname.ce0:
server1-ce0 netmask + broadcast + deprecated -failover up \
addif server1 netmask + broadcast + failover up

# /etc/hostname.ce4
server1-ce4 netmask + broadcast + deprecated -failover standby up


Solaris 10:

# main.cf:
MultiNICB multinicb (
    UseMpathd = 1
    MpathdCommand = "/usr/lib/inet/in.mpathd -a"
    ConfigCheck = 0
    GroupName = ipmp0
    Device @server1 = { nxge0 = 0, nxge4 = 1 }
    Device @server2 = { nxge0 = 0, nxge4 = 1 }
)

# /etc/hostname.nxge0
server1 netmask + broadcast + group ipmp0 up

# /etc/hostname.nxge4
group ipmp0 standby up


-- 
Jason Fortezzo
fortezza@mechanicalism.net
_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic