'Re: [Imap-protocol] Efficiently handling sequence numbers'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       imap
Subject:    Re: [Imap-protocol] Efficiently handling sequence numbers
From:       Timo Sirainen <tss () iki ! fi>
Date:       2012-11-09 22:16:19
Message-ID: 9B0A7B09-AF1D-41E5-A87E-7BD3A6A05DB2 () iki ! fi
[Download RAW message or body]

On 10.11.2012, at 0.05, Brandon Long wrote:

> > So, do you load the full sequence into memory on SELECT?
> 
> Yes, first read/mmap the index snapshot into memory and then apply any expunges and \
> other changes to it from the log. It could be optimized so that all the expunges \
> are done in one O(n) scan of the index instead of multiple separate of memmove()s, \
> but it hasn't been a real problem so far. I don't have the old/new index separation \
> yet, but I think that would be a good idea to add when the mailbox size grows past \
> maybe 100k messages (or maybe even sooner). 
> 
> Ah, ok.  If only loading the data was as simple as an mmap for us ;)  At this \
> point, I think the data only moves through 5 separate servers on 4 machines, \
> usually in the same data center though.  That, and of course we don't use UIDs as \
> our primary message-id, so we're looking at 32 bits for the uid and 64 bits for the \
> message-id, so 96 bits minimum per message, which means a 10M message folder is \
> loading 120MB, probably another 30MB of overhead in protocol buffers (not really \
> meant for millions of entries).

Sure, Dovecot has also similar extra metadata, more or less depending on the mailbox \
format. Keep the user typically assigned to the same backend server and you need to \
fetch the 120MB only once (or somewhat rarely). Send the data back in smaller diffs \
to avoid constantly re-uploading the whole 120MB file when it changes. I've been \
working on an object storage optimized backend for Dovecot these last few months. I \
think it could work well for GMail as well ;)

Anyway, like I mentioned several times already :), I think the most important \
optimization for huge mailboxes is the separation of old/new data. You most likely \
don't need to even download/read the old data normally. (Although many IMAP clients \
fetch 1:* flags after SELECT, but even if IMAP server had no problems with that, the \
IMAP client would probably be unusably slow so that's not a real problem.)

_______________________________________________
Imap-protocol mailing list
Imap-protocol@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-protocol

[prev in list] [next in list] [prev in thread] [next in thread]