[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha
Subject:    RE: [Linux-HA] ipfail failure on Solaris 8
From:       "Aaron.Sterr" <Aaron.Sterr () tradingscreen ! com>
Date:       2004-01-30 22:12:16
Message-ID: Pine.LNX.4.44.0401310709250.23093-100000 () ykdli001 ! dev ! tradingscreen ! com
[Download RAW message or body]

I reported this issue for 1.0.4 on Solaris a couple of weeks ago.  For some
reason the heartbeat process that is supposed to listen to ipfail does not
end up listening to the fifo.  So, when ipfail writes, there is no reader,
and the errors are generated from ipfail.

Haven't had time to track down where the exact issue is.  Using the beta
version was one suggested fix.  Also, it seems to work correctly for 
heartbeat-1.0.3.


-- 
Aaron Sterr - Infrastructure Engineer
TradingScreen Inc.
Tel: +81(3)3568-2022
Fax: +81(3)3583-8520
-----------------------------------------

This message and any attachments (the "message") are intended solely for the
addressees and are confidential.  If you receive this message in error, please
delete it and immediately notify the sender. Any use not in accordance with its
purpose, any dissemination or disclosure, either whole or partial, is prohibited
except with formal approval. The integrity of messages transmitted over the
Internet cannot be guaranteed. TradingScreen (and its affiliates) shall (will)
not therefore be liable for the message if modified.

On Fri, 30 Jan 2004, Sam Ganesan wrote:

> 1.0.4
> 
> On Fri, 2004-01-30 at 15:26, Soffen, Matthew wrote:
> > Is this 1.1.x or 1.0.x ?
> > 
> > If it is 1.1.x then it doesn't work at all yet.  It fails miserably.  
> > 
> > The 1.0.x series "works" ( passes the tests the python test suite ).
> > 
> > Matt
> > 
> > -----Original Message-----
> > From: Sam Ganesan [mailto:sganesan@iperia.com]
> > Sent: Friday, January 30, 2004 3:17 PM
> > To: General Linux-HA mailing list
> > Subject: Re: [Linux-HA] ipfail failure on Solaris 8
> > 
> > 
> > After doing some more searching and stuff... it looks like the pipe
> > closes right after it opens....
> > 
> > I will load it on truss and see what happens....  I wonder if the fopen
> > on the pipe returns an error that does not percolate through to ipfail??
> > 
> > Sam
> > 
> > 
> > On Fri, 2004-01-30 at 14:13, Sam Ganesan wrote:
> > > All;
> > > 
> > > 	Here is a failure/symptom/bug(I dunno what) running ipfail on
> > solaris. 
> > > I know SLES runs clean.... so does suse 9!!!(thanks lmb)
> > > 
> > > 
> > > Compiled on Solaris 8 kernel 108528-20
> > > 
> > > with gcc 3.3
> > > 
> > > libnet 1.0.2a from sunfreeware
> > > glib   1.2.10    ditto
> > > net-snmp 5.0.7   ditto
> > > 
> > > 
> > > 
> > > Heartbeat starts and acquires resources ...
> > > 
> > > the ipfail fifos are owned by ccm (user) and api (group)
> > > 
> > > the uid and gid match what I passed in as values to configure...
> > > 
> > > Here is the error in syslog from ipfail
> > > 
> > > Jan 30 14:04:37 s-ha-1 ipfail[8669]: [ID 801593 local0.error] ERROR:
> > > msg2stream: fflush  failure: Broken pipe
> > > Jan 30 14:04:37 s-ha-1 ipfail[8669]: [ID 801593 local0.error] ERROR:
> > > Cannot start node walk
> > > Jan 30 14:04:37 s-ha-1 ipfail[8669]: [ID 801593 local0.error] ERROR:
> > > REASON: can't send  message to RequestFIFO: Broken pipe
> > > Jan 30 14:04:38 s-ha-1 ipfail[8670]: [ID 801593 local0.error] ERROR:
> > > msg2stream: fflush  failure: Broken pipe
> > > Jan 30 14:04:38 s-ha-1 ipfail[8670]: [ID 801593 local0.error] ERROR:
> > > Cannot start node walk
> > > 
> > > 
> > > It looks like ipfail registers fine and then when it tries to write the
> > > request to the fifo it fails.... nobody at the other end of the named
> > > pipe???  This happens on both nodes...
> > > 
> > > 
> > > and here is the respawn directive from ha.cf.... it has the right user
> > > 
> > > ccm:x:85:90::/home/ccm:/bin/sh
> > > 
> > > 
> > > Sam
> > > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> 

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic