[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-cluster
Subject:    re[2]: HA and process migration (tru64)
From:       Greg Freemyer <freemyer () NorcrossGroup ! com>
Date:       2001-07-24 18:11:01
[Download RAW message or body]



> > That's it?  I thought there was more to it, esp. since supposedly
> > VMS job queues can survive being moved to a different node.  

> > I'd think the functionality described below can be done with a few shell
> > scripts.

VMS clustering is still ahead of Tru64 clustering, so yes the below is all that Tru64 \
does today (VMS does more but I am not VMS knowledgable).  

Tru64 is one of the top rated UNIX HA cluster solutions, but it is not yet supporting \
migration (just failover/restart), and even the below functionality is less than 2 \
years old under Tru64.  

(Prior to that it was just one socket listener per (IP:port) per cluster, not one \
listener per (IP:port) per node.  i.e. It was a pure HA cluster with HP capability \
only available thru custom written apps like Oracle (OPS).  Most commercial HA \
clusters are of this type (i.e. pure HA with no parallelism built-in).)

The good news is that the below makes it very easy to create a HA/HP configuration of \
a "stateless" application like Apache without the addition of a separate director \
(LVS, load balancing router, etc.).

Given that you also have a Cluster File System (i.e. common root), which Tru64 has, \
all you have to do is:

1) Create a new service IP for Apache and configure Apache to use it.
2) Invoke Apache on all nodes.

Now the automatically created and maintained IP director will distribute the sockets \
between the nodes round-robin fashion and if a node dies the IP director will quit \
sending it new sockets.

Tru64 has this so automated and seamless that many Tru64 cluster administrator's are \
not even aware it is happening.  In particular, I have seen many high-level Tru64 \
cluster drawings which leave off the IP director.

On the otherhand with a standalone/separate director (LVS, router, etc.) the director \
is one of the key conceptual pieces of functionality and is shown even on high-level \
diagrams.

My thoughts on why process migration would be nice:

One negative with the above is that even Apache is not truly "stateless", it \
maintains sockets for a brief duration, and those sockets timeout when a node is \
shutdown in an uncontrolled manner (and end users have to click the refresh button).

For basic Apache webserving, this is easily handled in the controlled shutdown case, \
by shutting down Apache, and having all new sockets getting routed to the other nodes \
and thus there is zero end-user observable behavior.

Unfortunately, many commercial applications have long lived sockets (i.e. hours/days) \
so the above technique doesn't work for them.

Thus, in my opinion, the ability to migrate the process/socket pair prior to a \
controlled shutdown would be highly beneficial.  

Many of you will now be thinking about keepalives, and retry logic.  I have been down \
that road several times, and have found it very distasteful.  

Retries work pretty good on active sockets, but many sockets sit idle for extended \
periods, and the TCP/IP keepalive mechanism leaves much to be desired.  It is not \
even supported in many popular O/Ses if I recall correctly.  

(The times I have tried to use it, I have had to take it back out because the TCP/IP \
stack just did not have it keepalive working correctly.)

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com

> > Greg Freemyer wrote:
> > 
> > > As far as Compaq's TruCluster's, they may have the infrastructure to
> > support
> > > moving an open socket, but they don't yet have process migration, nor
> > socket
> > > migration available in their released product.  Process migration is in
> > the
> > > roadmap.  I'm not sure about Socket migration.
> > > 
> > > What Compaq TruClusters does have is the following, and it may make
> > future
> > > socket migration easier:
> > > 
> > > Given a cluster of several nodes  (max. of 8 for now).
> > > They elect one of them to be the service IP director.  (They use HA
> > > technology to make this reliable.)
> > > All service IP traffic goes to the service IP director.
> > > The director then forwards it across the interconnect to the
> > appropriate
> > > node.
> > > For each open socket the director maintains a mapping to the node
> > the
> > > traffic goes to.
> > > 
> > > The director also maintains a list of listeners on each node.
> > > Then when a SYNC comes in for a specific port, it distributes it
> > round
> > > robin fashion between the listening nodes.
> > > 
> > > They have a separate director for each service IP and a separate
> > election
> > > process for each director.
> > > 
> > > Another interesting aspect is that on outbound SYNCs, you have the
> > choice
> > > to either identify
> > > yourself by your local nodes IP, or by a service IP.  I'm not sure
> > if they
> > > support this to make
> > > admin of external systems easier, or if their is some HA aspect to
> > it, or
> > > maybe it is to allow the
> > > future process/socket migration to work.
> > > 
> > > I ASSUME that much of the above is implemented in the kernel, and even
> > in the
> > > tcp/ip stack.
> > > 
> > > Greg Freemyer

> > -- 
> > David Nicol 816.235.1187


> > Linux-cluster: generic cluster infrastructure for Linux
> > Archive:       http://mail.nl.linux.org/linux-cluster/









Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic