[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    [Linux-ha-dev] Re: [Fwd: Re: draft of updated GRITS note]
From:       David Brower <dbrower () us ! oracle ! com>
Date:       2000-03-08 3:19:03
[Download RAW message or body]

> -------- Original Message --------
> From: "Gary D. Young" <gdyoung@us.oracle.com>
> Subject: Re: draft of updated GRITS note
> To: David Brower <dbrower@us.oracle.com>, jleys@us.oracle.com

 >     OPEN: The current proposal does not address layers of fencing or
> >     escalation and isolation.  It might be useful to identify levels
> >     at which fencing may be stopped without doing higher levels.  For
> >     instance, if all disk i/o may be stopped by frobbing the
> >     fibrechannel switch, then turning off the power may not be
> >     necessary.
> 
> Remind me please: what's frobbing?

The jargon file is of mixed help.  It's entry for "frob" only gives
as verb form "frobnicate". Helpfully, the entry for "molly-guard" 
uses it in context:

  molly-guard /mol'ee-gard/ /n./ [University of Illinois] A shield to 
  prevent tripping of some Big Red Switch by clumsy or ignorant hands. 
  Originally used of the plexiglass covers improvised for the BRS on 
  an IBM 4341 after a programmer's toddler daughter (named Molly) 
  frobbed it twice in one day. Later generalized to covers over 
  stop/reset switches on disk drives and networking equipment. 

The usual use of "frobbing a knob" is making some adjustment
on some piece of equipment.  It's sort of like tweaking, only
it is generally an "official" adjustment, where tweaking can
be done out-of-band with metal cutting tools and impact devices :-)

> >     Potential protocols include:
> >
> >         ONC RPC
> >         CORBA
> >         HTTP
> >         HTTPS
> >         COM/DCOM
> >         SMB extensions
> 
> Was there some reason why TCP/IP was not included? It's most likely
> going to be the communications module used for the first version,
> so it seems kinda silly to leave it out. All of the above probably
> make use of TCP/IP in some fashion.... perhaps "sockets" or
> something would be the appropriate terminology here?

I suppose if pressed, I'd say HTTP is tcp with ascii, er, ISO 8859 syntax,
and that I hate rolling my own message formats.   Would anyone like to
argue for raw tcp?

> >     To establish ordering of quorum generations, GRITS must consider
> >     the possibility of wraparound.  It is suggested that something like
> >
> >     static inline x_before_y (unsigned x, unsigned y)
> >     {
> >         return ((signed) (x - y)) < 0;
> >     }
> >
> >     will suffice.
> 
> Certainly there will be some default generation number that is used
> when you initially request the forming of a cluster. So if you happen
> to wrap around to it, any new nodes will either assume they are
> already part of the cluster, or the storage unit may assume they are
> since they have the correct generation number.
> 
> Potential solutions could be:
> 1) to skip the "magic number" assigned as default when incrementing.
> 2) try to arrange the protocol so that you must have the correct
> generation number AND be considered part of the group by GRITS.
> 
> Hm. Is this point so obvious that you just glossed over it, or
> am I hitting something meritous here?

Maybe, or maybe something different.

I was talking with John Leys, and we came up with some other problem 
cases, and maybe a solution to the "stable store in the resource 
for quorum generation problem"

Here is a scenario.  The cluster is at genration 1 and it partitions;
subset Alpha forms and gains quorum at gen 2.  It starts sending fencing
commands out, but has a problem and stalls.  Another reconfig is done,
forming generation 3, and it sends fences successfully to all resources.
Then, the sleepy quorum master from gen 2 comes back to life, at the same time
resource R dies.  R reboots, and receives a message from sleepy 2, and fences
out the members of gen 3.  Or R sends it's "set me" request to 2, who cheerfully
responds with the same membership; same result.

This would be solved if we remembered generations in the resource, because
R would know not to talk to generation 2.  But we'd like not to require that.

We can solve one part if we insist that every time a quorum groups gets a 
"set me" query from a booting resource, it explicitly checks to make sure
it still holds quorum.  We can solve the other part if a booting resource
without a memory challenges the first command it receives, forcing the
same explicit quorum check from the commanding node.   This way there 
is no need for persistant store in the resource, but having persistent
store eliminates the need for a challenge.

How does that sound?

Still unsettled is the need for generation memory in the group itself;
if the whole group dies, must it start past the previous generation, 
or can it start at 0 or 1 again?  With a challenge, it might work
to let the quorum generation float, and be negotiated to the max of
the remembered generations from all the members and the available 
resources.  An alternative is Gary's "magic epoch" value, skipped
on wrap around. 

Any opinions?

> >     FIXME - For this to work, we need to fix the width of the
> >     generation, to 16, 32, or 64 bits.  My inclination is to make it
> >     64 bits.
> 
> Putting it at 64 bits would make it rather unlikely that the ceiling
> is ever hit. 1ms network latency * 2^64 is "sizable". Some machines
> may have to do their own 64 bit arithmetic if they don't have 64
> bit libraries (or 64 bit processors), but that's not a serious issue.

Yes, but arguing against myself, I'm not sure I want to -require- 64 
bits, esp. if I'm trying to work with an existing group manager that 
has a smaller native width.  GRITS/NATALIE is not supposed to be
Linux specific, so I kind of don't like wiring things in like
generation size.

Any opinions?

> > Resource Settings
> >
> >     A resourceSetting is a combination of
> >
> >         { resource, node, allow|deny }
> >
> >     ResourceSettings are a list or array of resourceSettings to
> >     cover a set of resource/node bindings.
> 
> I'm still keen on putting "administrate" as one of the attributes.
> Then fencing "administrate" access from all the rest of the nodes
> would result in quorum being established. If everyone seeking
> quorum had the protocol of "fence everyone else off, and if those
> all succeed then you're the reconfig leader". (If two people were
> vying, obviously one of them would fence the other first, and
> then at least one of the other's fencing commands would fail.
> Whoever else they fence in the process would be redundant.)
> 
> This has issues when the reconfig leader dies, though.... but
> I suppose any quorum service has to be able to recover from
> the token being lost.

I'm leery of semantics beyond allow/deny, because I think that
there will be some very crude switches we'd like to use as our
enforcers.  I don't see how to implement "administrate" using
an access decision in a switch, for instance.  It would be neat
if we knew how "administrate" could be made to work everywhere.

Does it sound possible to anyone else?
 
> >     OPEN:  the current proposal does not address errors or timeouts
> >     that could be returned from Set operations.
> 
> Ack/Nack-ing each set request would have its merits. The option of
> querying after each request to verify that it went though has a flaw
> when of someone interfering between your set and your verification
> request.

I think synchronous error is the best thing to do; explcitly supporting
asynchrony with completion status checking is ugly, and I'm willing
to assume threads for parallelism.

thanks!

-dB

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.tummy.com
http://lists.tummy.com/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic