'Re: [GE dev] Re: [GE users] new fail over model'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       grid-engine-dev
Subject:    Re: [GE dev] Re: [GE users] new fail over model
From:       Andreas Haas <Andreas.Haas () Sun ! COM>
Date:       2002-04-29 8:43:31
Message-ID: Pine.SOL.4.32.0204291026430.677-100000 () sr-ergb01-01
[Download RAW message or body]

Ron,

see my answer below.

Cheers,
Andreas

On Sat, 27 Apr 2002, Ron Chen wrote:

> --- Fritz Ferstl <Friedrich.Ferstl@sun.com> wrote:
> > That certainly relieves the qmaster from any dealing
> > with shadow master issues and allows you most likely
>
> > to abstain from any changes to qmaster code. Seems
> > like an excellent idea as far as I'm concerned!
>
> Yes, I want the master election, and alive message
> broadcast to move away from the master, the master
> should be handling requests, and queuing them up.
>
> The main advantage of the design is that we can
> decrease the time between the failure of the master
> and the start of master election.
>
> > Using TCP broadcasting for the alive protocol and
> > the master election doesn't seem to have an issue
> > with respect to network load due to the usually
> small
> > number of shadow masters. I have no feeling whether
> > it's really better compared to heartbeat the file
> > though.
>
> 1. with the heartbeat file, we can't have the original
> master host to gain back control easily.
>
> 2. Also, when the admin wants to shutdown the master
> host, we can init. an election right at that time.
>
> 3. While a process is select()ing, the process is
> basically idle, while a process checking the status of
> a file needs to sit in a tight loop.
>
> I think we can seperate the elect() and broadcast()
> modules nicely so that we can modify/let the user
> choose later --
>
> -> so we can have the TCP version of the protocol or,
> -> we can keep the heartbeat file method if needed.
>
> > I can envision a shared-filesystem-free client
> > protocol to find a master in
> > case the master host changed to work like this:
>
> I actually got that idea while I was typing, I don't
> think we have the need to implement it any time soon
> :-)
>
> > TCP broadcasting might be a replacement for the
> > eventually iterative and
> > time consuming step 3. I don't have experience with
> > TCP broadcasts though
> > so I can't really comment. Some issues could be
> > network load at the time of
> > a master failure and interoperability with commd.
> >
>
> Can you tell me more about commd -- I've read the html
> file in daemons/commd, but it does not have the
> decriptions of the APIs.
>
> The reason I need commd is not about performance, but
> more of the host name resolution issues. Is it
> possible that I can use the way SGE resolves

Yes it is possible and this is exactly how it should be done.
For a samle have a look to utilbin/gethostbyname.c. When started
with the -aname option the gethostbyname utility does
hostname resolving exactly like all SGE components do incl.
SGE host aliasing (see sge_h_aliases(5)) without however ever
contacting commd, because commd code is used by
sge_host_resolve_name_local(). The benefit of doing it that
way is that no commd is needed. If commd availability this is
not an issue you might instead use commlib getuniquehostname().
The resulting hostname is the same.

> hostnames, and let the shadowd to do TCP connections
> with other shadowds?
>
> > One final comment concerning the case of network
> > partitioning into two
> > subnets: I don't see that the proposed shadow master
> > redesign solves this
> > situation.  In case of a split into 2 subnets you
> > either would have the
> > config and spool files
> >
> > - reachable only in one of the subnets in which case
> > only this subnet can
> >   remain active, or you would have them
> >
> > - reachable in both (then you'd have two networks,
> > one for file sharing -
> >   which still would work - and the other for the
> > other traffic - which
> >   would be down) in which case you'd end up with a
> > split-brain situation:
> >   two masters hammering eventually conflicting data
> > into the same
> >   repository.
> >
> > I don't really have a good idea for what to do in
> > case of such scenarios at
> > this point.
>
> Let's not worry about those, it should be configurable
> by the user. (eg. the side which has the default spool
> directory can continue to accept jobs, while the other
> side which only have the backup spool directory can
> dispatch jobs but not accepting them)
>
> -Ron
>
> >
> > Cheers,
> >
> > Fritz
> >
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Health - your guide to health and wellness
> http://health.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@gridengine.sunsource.net
> For additional commands, e-mail: dev-help@gridengine.sunsource.net
>
>


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic