[prev in list] [next in list] [prev in thread] [next in thread] 

List:       grid-engine-dev
Subject:    Re: [GE dev] Job dependencies and filesystem caches
From:       hanzl () noel ! feld ! cvut ! cz
Date:       2003-06-26 16:06:55
Message-ID: 20030626180655V.hanzl () unknown-domain
[Download RAW message or body]

>  >  - how do you ensure filesystem consistency for dependent jobs?
> 
> Grid Engine doesn't do anything explicit. You might be able to embed 
> something into the epilog and prolog scripts to that effect though.

Oh yes, what I really wanted to ask was: How do you, users of GE,
ensure consistency for dependent jobs? Do you observe problems? Do you
embed something clever to prolog/epilog scripts?

> I've made limited use of Unison:
>
> http://www.cis.upenn.edu/~bcpierce/unison/

Looks nice, I did not know this one. However I guess it can be slow if
your jobs can touch any file out of a huge dataset? In this case one
should probably tell Unison where to look exactly, making it a bit
more complicated for the user than it have to be. I would be happy to
use something with kernel support, being able to detect recent changes
in file cache and flush them on demand.

Maybe I should explain more exactly why I am interested in this topic:

1. Without the guarantee that network filesystem propagates changes
made by job A to files seen by job B, job dependencies are just a
random game with a variable share of luck, mysterious errors, upset
users and administrator headaches. This should probably be written in
manuals next to the explication of -hold_jid option to prevent naive
users from expecting too much from this mechanism.

2. On the other hand, if we can provide the guarantee, this mechanism
can be very useful - it can become script&file-level analogy of MPI
message passing and make efficient parallel computition much more
accessible to naive users who do not want to learn MPI but are clever
enough to specify correct dependencies and pass 'messages' in files.

I think there is a huge (but yet rather hidden) demand for a better
network filesystem - something that could use local disks as a
persistent file cache and also to cache changes and propagate them as
needed when there is time to do that. In linux there is currently
nothing like this one could use easily but there are many things which
are close (but not quite suitable for clusters for some
reason). Besides things like rsync and unison, which are not
filesystems and therefore require too much work from the user, there
are for example these:

  Coda - too cumbersome 
  AFS - too cumbersome 
  cachefs - not available for linux 
  Ron Minnich's autocacher - too much dust on it, I cannot compile it on linux today 
  Greg Badros's Disk-Caching NFS - no port for 2.4.x kernels 
  Intermezzo - too unstable yet 

The last one is probably closest to my idea. Common problem for
network filesystems is that if one wants them to behave exactly the
'UNIX way', things have to be slow. There are various attempts to give
up some required properties of the filesystem and get something quick
and still usable for common tasks.

And here we come to the reason I write all this - job dependencies are
an excellent oportunity to specify to filesystem what changes we
really need to be propagated and where we need them propagated. This
can have huge impact on the cluster throughput.

Grid Engine is being developed and future filesystems with inteligent
caching are being developed, at least as ideas in our heads. I think
it is important to note how just a small cooperation can help a lot.

As a minimum, clients should be able to flush all file caches to
server on-demand and this demand should go to epilog scripts. They
should also be able to get latest filesystem status from server, and
this demand should go to prolog scripts.

(Even better, kernel could remember which changes were made by
particular process group (job) and be able to flush just these
on-demand, and maybe even server could be able to propagate just these
changes where they are needed.)

OK, enough. Thanks for reading this :-) Let me know if somebody also
things that this is a hot topic today.

Regards

Vaclav Hanzl

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic