'[Fresco-devel] High-availability-fresco discussion'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       berlin-design
Subject:    [Fresco-devel] High-availability-fresco discussion
From:       Neil Pilgrim <linux () kepier ! clara ! net>
Date:       2002-11-18 4:59:00
[Download RAW message or body]

What I describe below is a summary of an irc discussion from over last
weekend. People (actively) present were nicholas, njs, chalky and myself
(neiljp). This is a summary of (some of the) topics discussed with some
structuring/ideas added for clarity (eg. the need to store the desktop
information somewhere); if I have misrepresented any of the suggestions
put forward by others, please correct me! There are no direct
attributions, but this is certainly not all my own original ideas; even
with an irc log I don't want to attribute directly - its taken me long
enough to summarise this as it is and I'd like to get it out there for
further commentary :)

Informing clients of clean server shutdown
==========================================

We started talking about this after I asked what people thought about
having a 'server-is-shutting-down' callback function in the
ClientContext, so that clients could be made aware of when the server
shut down and so also shut down cleanly. A brief summary of this idea
can be found in task57 (http://issues.fresco.org/task57).

The basic problem that this addition and most of the following
discussion/proposal attempts to address is associated with implementing
a connection protocol over the connection-less corba framework.

Handling of server crashes
==========================

One point expressed was that although the callback idea above dealt with
the server explicitly being shut down in some 'clean' way (including
through various signals or maybe closing the display, eg using ESC on
the fresco desktop), it wouldn't handle cases where the server crashed,
hung or otherwise didn't shut down cleanly. For clients to detect server
crashes we might try one of the following.

- Currently the server pings each client in turn, to detect whether the
client is still running and if not then tidy-up associated resources in
the server; a similar system could be used for clients to determine if
the server was still available. A less network-heavy method may be for
clients to keep track of the most recent server-ping and time-out after
a certain period of not being contacted.

- a separate process (or multiple) could ping the server, which could
then call back to the client(s) if/when the server died; a separate
basic process would be much less likely to crash, and would be safer to
restart automatically (eg. why you only use xdm/gdm/kdm once you're sure
that X is configured correctly - similarly the display server shouldn't
necessarily restart automatically, at least for now). Multiple copies of
these processes could be run to keep track of both the display server
and each other.

Reconnection of running clients (a running session) to a new server
===================================================================

If (when?!) the display server crashes (or is stopped manually) and a
new server will be started in a short period of time, it would be useful
for running clients to be able to be reconnected to the new server. This
would be useful both for maintaining running clients during
intended/unintended restarts, and also if wanting to move a server to
another machine, but keep the same running clients (desktop). Such
abilities are not generally available (or used) with other popular
windowing systems, though someone mentioned 'xmove' to allow something
like the latter for X11, akin to 'screen' for terminals (afaik).

Since the graphical state in fresco is stored in the server, connecting
a running 'client' session to a server requires constructing the
scene graph associated with that session. Two main options were
presented:

i) load desktop/window information and a list of connected clients from
some
external source and request each client to regenerate their associated
gui elements. No synchronisation overhead is required except as needed
(see below), but clients need to be built such that their gui can be
regenerated on-demand.

ii) load the complete information on the scene graph and a list of
connected clients from some external source. For this the server might
regularly snapshot its state into a file, or perhaps a separate process.
This has the downside of requiring continual updates, which may degrade
the performance of the server. However if all the scene graph
information can be stored in this way then a recovery should be fairly
simple.

Both methods require storage of server state additional to that
associated with clients, ie. that describing the desktop and associated
windows in the classical WIMP design. The client-specific UI is then
either regenerated from some 'last known' state (in (ii)) or regenerated
from clients (in (i)). Note that the 'external source' could be a file,
or some other running process that mirrors the scene graph in the main
(display) server.

Publishing/locating the list of clients associated with a running
session
-------------------------------------------------------------------------

The server always needs to know what clients it has: to join UI elements
to client Models (eg. Commands) and to tidy-up in case of clients
exiting.

On starting up a server which attempts to connect to a running session
of clients, it needs to somehow look up which clients are in that
session. The
alternative would be for running clients to continually poll for 'their'
server; this polling could occur over some small-to-medium time period
and by many (many!) clients at once, which makes this option much less
appropriate.

How to publish the client list? Looking at the methods used to publish
the server reference, we now can't really use corbaloc since there are
now multiple pieces of information. The remaining choices involve
publishing in a file-system or in some network directory service, with
information being updated as
clients connect and disconnect from the server.

Using a filesystem should work, and is satisfactory if we simply want to
deal with servers restarting on the same machine or on machines with
shared drives. If we do not have this situation available, we must use
some network directory service, the most obvious solution being to use
the corba nameservice. The major drawback of the nameservice is that it
has no built-in security/authentication mechanism(s) - afaik it is
similar to a world-writeable network-transparent mini filesystem.

Rebuilding client UI in new servers
-----------------------------------

Requiring clients to be designed to allow simple UI creation on-demand
could be difficult; such requirements need to be sufficiently integrated
into normal client development, rather than being an extra api which
most clients won't bother to implement.

What about when scene graphs are embedded in one another, with different
parts owned by different clients; will that be difficult to regenerate?
Do all of an application's UI elements need to be created in one go?

This could also be useful if we want to duplicate client windows across
multiple displays: always use the UI-generation functions and you just
call them once per display :)

(Re)constructing the scene graph from an external source
--------------------------------------------------------

Since the desktop information (at least) should normally be duplicated
outside of the server, either we must have a fresco process(es) storing
that state, or else we must deal with serialisation of (parts of) the
scene graph to disk. The latter is important even if we have some extra
process storing the server-global information, since we may want to
snapshot that state to disk for backup purposes in case that process
crashes for some reason.

We didn't discuss the implementation, though unsurprisingly there was
mention of xml ;) However of course xml is more hierarchical, and
although it allows cross-references, how well can/will it describe the
scene
graph?

Serialisation could be useful for other reasons, such as under
high-memory-load conditions in the server: if certain parts of the scene
graph are not visible, they could be 'swapped-out' to disk, and reloaded
if/when they become visible. 

Also in the same way that when we load (use) a kit, we create a
prototype for each thing it can produce (is that right?), we could have
something similar for applications to use: the user creates
a composite graphic, then gives it a name and can clone the entire thing
later. Does that gain us anything? Perhaps a 'MacroKit'. (builder
pattern?)

Finally to avoid multiple corba calls, such as involved in creating
large sections of the scene graph, it was suggested we could pass
some UI description to the server as a string (ahem, eg in xml), along
with a map of important name/graphic pairs. Would that be faster (one
call, quite a bit of data) than multiple separate calls to construct the
UI?

A 'mediator' server
===================

Given the insecurity of the corba nameservice and the system-specificity
of publishing in filesystems, I suggested the idea of having some kind
of 'mediator' server where the clients associated with a particular
server might be listed.

Such a server could handle multiple purposes, including many of those
mentioned above:
- the mediator could ping both clients and server(s), informing the
other when they are not able to be contacted.
- mediator servers could synchronise with each other when clients
(dis)connect or servers are started/stopped (eg running one on each
machine with fresco servers/clients on)
- command-line utilites could be used to determine whether clients in a
session should be stopped completely, or to close individual clients, or
to otherwise manipulate them.
- having one mediator per machine means that we can have all the ping
methods as described above, but we can use an optimal transport (eg.
unix socket with omni4); pinging occurs locally to whatever fresco
processes
are running using eg unix sockets, synchronisation between mediators
does not occur using that kind of polling mechanism, but through
callbacks or similar, so we avoid using 'wide-area' bandwidth as much
- we could mirror (backup) the desktop information in the mediators (?)
- they could act as another method for locating the initial server
reference, or perhaps be used *instead* of that?
- if implementing (ii) as above, we could duplicate the scene graph
itself in the mediator (hmm, overkill?!)

I'm not saying that all these ideas are things we should press ahead
with, this is more a brainstorming session :)

Let the comments begin!

-- 
Neil

_______________________________________________
Fresco-devel mailing list
Fresco-devel@fresco.org
http://lists.fresco.org/cgi-bin/listinfo/fresco-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic