[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cyrus-devel
Subject:    The great TODO
From:       Bron Gondwana <brong () fastmail ! fm>
Date:       2015-03-11 7:25:56
Message-ID: 1426058756.2702307.238748661.6E0FA201 () webmail ! messagingengine ! com
[Download RAW message or body]

First of all, before getting into the "what we need to do for 3.0" I want to wax \
philosophical for a moment...

Shit goes wrong.  All sorts of amazing things.

* The computer can crash at literally any moment during any action and any codepath
* The OS can re-order writes to disk in just about any way
* fsync can lie (* we can't do anything about this one)
* disks can fill up
* a partition can have wrong permissions on it, both at startup and randomly while \
                things are running
* a partition can go missing / randomly be unmounted.
* the OS can randomly return a few bytes of zeros in the middle of your mmaped file:
  https://lkml.org/lkml/2008/6/17/9
* a multi-disk corruption can cause a random block of rubbish to appear within a file

Run a big enough set of servers for long enough, and you'll see all these things, \
whether due to admin error, or hardware failure...

Our job as developers of Cyrus IMAPd is to make sure that we cope with what we can, \
don't fail catastrophically, and make recovery as good as possible.

On the flip side, we don't want the admin to have to micro-manage everything.  As \
much as possible, we don't want the abstraction of a reliable mail store to leak:

http://www.joelonsoftware.com/articles/LeakyAbstractions.html

So what we want to do for Cyrus 3.0 falls into three main buckets:

1) make things more robust/scalable.  That's all these things above, handle them \
cleanly or provide the best possible recovery path. 2) make Cyrus easier to \
run/administrate.  Things in this bucket include the authentication system, backups, \
moving users between servers, replication, etc 3) new features and standards support. \
Things like object storage, external search engines, JMAP, sieve variables/date/etc.


So if we are proposing something which takes away an existing repair mechanism - for \
example you can rebuild mailboxes.db by walking the tree of directories right now, \
we'd better be proposing something just as recoverable, but better in some way as \
well - like adding the mailbox name (and past mailbox names...) to cyrus.header and \
then storing all the files with paths based on the UNIQUEID, which is a UUID, and \
doesn't contain weird characters, and has a fixed length.  So you don't have stupid \
things like mailbox names being constrained by the characters supported by your \
filesystem, and case significance, and you get fast renames... but you don't lose the \
ability to recover.

Checksums.  We sanity check almost everywhere, because you can't do a full system \
scan at startup, checking the sha1 of every single file, to make sure there has been \
no corruption.

We scan files at backup time.  We scan them during replication.  We need a tool which \
scans them from a cron job for people who want to check that... maybe reconstruct \
needs flags to say "check but don't change things", so you can run it from cron but \
not be afraid that it will run when your data drive has unmounted by accident and \
wipe out your entire cyrus.index because it can't find the spool files.

At FastMail we have a tool that can fetch a damaged file from its replica.  We need \
that in Cyrus - either the magic perl script, or better - something built in to a \
tool in C.  Ditto for many other FastMail specific external Perl utilities.

-----

So now we know what and why we're doing... here's my rough things that need doing:

* Mailbox transactions: avoid failures leaving mailboxes in corrupt state (might \
                require 3-fsync commit, so we at least know if it's unfinished)
* UniqueId paths (described above)
* robust backup and restore tooling
* Replication based repair:
  a) replication and existing replica awareness in code
  b) replication based XFER (falls in with this)
  c) reconstruct support for checking replicas for files
  d) reconstruct sanity checking - are the spools broken, don't keep working
* files by sha1 rather than UID in mailboxes?  Means you can't rebuild in exactly the \
same order without cyrus.index, but if you've lost cyrus.index you may as well just \
                sort them by date and then give the mailbox a new UIDVALIDITY anyway.
* mailboxes.db new key format - better sorting
* For performance at scale: reverse ACL map.
* For real reliability - synchronous replicas (falls out of awareness above)

* For general speed and also safety - central cleanup daemon: use the same logic we \
use for sync_client and (at FastMail) squatter indexing.  Changes to mailbox cause a \
log entry.  A daemon processes those logs, does cleanup tasks in the background.  \
During startup this file can be resolved - so half-finished renames can be found and \
finished or reverted - so long as we log intent before making changes.. actually, I \
really like this:

lock(mailbox);
sync_log(mailbox->name);
/* do stuff */
unlock(mailbox);

rather than the current:

lock(mailbox);
/* do stuff */
sync_log(mailbox->name);
unlock(mailbox);

And then all the task things do a trylock, and if it fails, they just insert the \
record into their source log file again.  That way, they retry them again in a moment \
(to avoid busywait, add a pause if you didn't process ANY changes this time around).  \
This makes sync not wait on tasks, yet intent get logged early, before changes are \
made, so we can never miss something because there was a crash before the commit \
finished and the event was logged.

* External system integration points
* OS packages
* Docker images / VMs (for production use)

I'll try to get this into Phab tickets tonight - just about to leave work now.

Bron.


-- 
  Bron Gondwana
  brong@fastmail.fm


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic