> The easiest way to have fault tolerance would be to > match up your IMAP servers in an active/active setup > where each IMAP server has another server that's > willing to take over if a failure occurs. As I mentioned earlier in this thread this seems a rather costly approach for what little redundancy you get. The only things you are protecting yourself against are the disk drives and server compenents inside the primary server box, namely power supply, CPU, moterboard, SCSI controller, and network card. Your RAID array, your power source, and your network paths remain the same in terms of failure points. The power supply and network card can be easily made redundant and there is no requirement for external RAID to get disk redundancy so the only redundancy you are truly adding for all practical purposes by using this setup are CPU, motherboard, and SCSI controller. Of all the things in the system, these three are the ones I am the least concerned about. Granted I personally favor geographic distrobution using separate servers at physically separate sites because I'm looking for fault tolerance in the form of multi-master replication and fail over. So fault tolerance to me means being able to use a server from a different site because the server at my primary site went away (note that site could simply mean network segment). I also always prefer to be able to take advantage of any hardware actually turned on. So there wouldn't be any "spare" servers just tolerant "live" servers. Hence the multi-master definition. The folks working on the "Spread" toolkit (WAN group communication services) had an excellent article on creating a very efficient multi-master postgres DB. The paper and concepts can be found here: http://www.cnds.jhu.edu/pub/papers/cnds-2002-1.pdf The things that I found extraordinary were they were able to maintain a very high level of throughput while simoultaneously gauranteeing correctness of the DB in that all transactions happened in the same order across all locations. So it would be possible to implement a multi-master lock synchronization tool to abstract Cyrus away from relying on the file system locking. What would be involved in creating either the cyrus "master" process, or some other daemon responsible for the file locking? Specifically to support multiple Cyrus processes operating on the same shared file structure (via NFS, or multi server mount)? Has there already been a good description of the problem posted to the list? -- Michael --