[prev in list] [next in list] [prev in thread] [next in thread]
List: cyrus-devel
Subject: The great TODO
From: Bron Gondwana <brong () fastmail ! fm>
Date: 2015-03-11 7:25:56
Message-ID: 1426058756.2702307.238748661.6E0FA201 () webmail ! messagingengine ! com
[Download RAW message or body]
First of all, before getting into the "what we need to do for 3.0" I want to wax \
philosophical for a moment...
Shit goes wrong. All sorts of amazing things.
* The computer can crash at literally any moment during any action and any codepath
* The OS can re-order writes to disk in just about any way
* fsync can lie (* we can't do anything about this one)
* disks can fill up
* a partition can have wrong permissions on it, both at startup and randomly while \
things are running
* a partition can go missing / randomly be unmounted.
* the OS can randomly return a few bytes of zeros in the middle of your mmaped file:
https://lkml.org/lkml/2008/6/17/9
* a multi-disk corruption can cause a random block of rubbish to appear within a file
Run a big enough set of servers for long enough, and you'll see all these things, \
whether due to admin error, or hardware failure...
Our job as developers of Cyrus IMAPd is to make sure that we cope with what we can, \
don't fail catastrophically, and make recovery as good as possible.
On the flip side, we don't want the admin to have to micro-manage everything. As \
much as possible, we don't want the abstraction of a reliable mail store to leak:
http://www.joelonsoftware.com/articles/LeakyAbstractions.html
So what we want to do for Cyrus 3.0 falls into three main buckets:
1) make things more robust/scalable. That's all these things above, handle them \
cleanly or provide the best possible recovery path. 2) make Cyrus easier to \
run/administrate. Things in this bucket include the authentication system, backups, \
moving users between servers, replication, etc 3) new features and standards support. \
Things like object storage, external search engines, JMAP, sieve variables/date/etc.
So if we are proposing something which takes away an existing repair mechanism - for \
example you can rebuild mailboxes.db by walking the tree of directories right now, \
we'd better be proposing something just as recoverable, but better in some way as \
well - like adding the mailbox name (and past mailbox names...) to cyrus.header and \
then storing all the files with paths based on the UNIQUEID, which is a UUID, and \
doesn't contain weird characters, and has a fixed length. So you don't have stupid \
things like mailbox names being constrained by the characters supported by your \
filesystem, and case significance, and you get fast renames... but you don't lose the \
ability to recover.
Checksums. We sanity check almost everywhere, because you can't do a full system \
scan at startup, checking the sha1 of every single file, to make sure there has been \
no corruption.
We scan files at backup time. We scan them during replication. We need a tool which \
scans them from a cron job for people who want to check that... maybe reconstruct \
needs flags to say "check but don't change things", so you can run it from cron but \
not be afraid that it will run when your data drive has unmounted by accident and \
wipe out your entire cyrus.index because it can't find the spool files.
At FastMail we have a tool that can fetch a damaged file from its replica. We need \
that in Cyrus - either the magic perl script, or better - something built in to a \
tool in C. Ditto for many other FastMail specific external Perl utilities.
-----
So now we know what and why we're doing... here's my rough things that need doing:
* Mailbox transactions: avoid failures leaving mailboxes in corrupt state (might \
require 3-fsync commit, so we at least know if it's unfinished)
* UniqueId paths (described above)
* robust backup and restore tooling
* Replication based repair:
a) replication and existing replica awareness in code
b) replication based XFER (falls in with this)
c) reconstruct support for checking replicas for files
d) reconstruct sanity checking - are the spools broken, don't keep working
* files by sha1 rather than UID in mailboxes? Means you can't rebuild in exactly the \
same order without cyrus.index, but if you've lost cyrus.index you may as well just \
sort them by date and then give the mailbox a new UIDVALIDITY anyway.
* mailboxes.db new key format - better sorting
* For performance at scale: reverse ACL map.
* For real reliability - synchronous replicas (falls out of awareness above)
* For general speed and also safety - central cleanup daemon: use the same logic we \
use for sync_client and (at FastMail) squatter indexing. Changes to mailbox cause a \
log entry. A daemon processes those logs, does cleanup tasks in the background. \
During startup this file can be resolved - so half-finished renames can be found and \
finished or reverted - so long as we log intent before making changes.. actually, I \
really like this:
lock(mailbox);
sync_log(mailbox->name);
/* do stuff */
unlock(mailbox);
rather than the current:
lock(mailbox);
/* do stuff */
sync_log(mailbox->name);
unlock(mailbox);
And then all the task things do a trylock, and if it fails, they just insert the \
record into their source log file again. That way, they retry them again in a moment \
(to avoid busywait, add a pause if you didn't process ANY changes this time around). \
This makes sync not wait on tasks, yet intent get logged early, before changes are \
made, so we can never miss something because there was a crash before the commit \
finished and the event was logged.
* External system integration points
* OS packages
* Docker images / VMs (for production use)
I'll try to get this into Phab tickets tonight - just about to leave work now.
Bron.
--
Bron Gondwana
brong@fastmail.fm
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic