[prev in list] [next in list] [prev in thread] [next in thread] 

List:       evms-devel
Subject:    Re: [Evms-devel] Daemon Patch for Large Clusters
From:       "Robert Wipfel" <rawipfel () novell ! com>
Date:       2006-09-09 14:26:19
Message-ID: 45027AC0020000CF0001A1FC () sinclair ! provo ! novell ! com
[Download RAW message or body]

> Steve Dobbelstein <steved@us.ibm.com> 09/06/06 9:16 AM:

> > "Changju Gao" <CGAO@novell.com>t wrote on 09/05/2006 11:26:59 AM:

> > After further tests, I found some other cases when open/close threads
> > interfering with each other. So I added code to fend off other open/close
> > requests while bringing up or shutting down worker.

[...]

> Thanks for reporting these issues and for suggesting fixes.  As you can
> tell, the EVMS support for the clustered environment has the basic
> functionality but could use more work in the area of robustness.  What I
> would like to do is step back and look at this as a protocol design issue
> rather than applying patches in local places where particular scenarios
> fail.  If the design is correct the code should be simpler and smaller than
> putting lots of conditional checks in various places.  Fixing the design
> will, of course, take more time, but it should result in better code in the
> end.  Once I have something in place I'll run it by you to make sure it
> satisfies your particular scenarios.

Hi Steve,

We found this because some higher layer code introduced a side-effect of causing all \
nodes to open the engine at more or less exactly the same time. The race between \
engines, failing to open (acquire) all workers, and then having to back out with a \
distributed close, exposed some windows. We agree that closing these windows is a \
short-term code fix and would rather consider some protocol design alternatives - \
e.g. suppose open_engine could be implemented as "acquire distributed cluster lock" \
thru the ECE. The lock would be released by close_engine, engine process failure, or \
node failure. Lock primitives would be useful for the Cluster Segment Manager (CSM) \
too...

Thanks,
Robert


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Evms-devel mailing list
Evms-devel@lists.sourceforge.net
To subscribe/unsubscribe, please visit:
https://lists.sourceforge.net/lists/listinfo/evms-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic