From kmail-devel Wed Jul 23 04:58:10 2003 From: Don Sanders Date: Wed, 23 Jul 2003 04:58:10 +0000 To: kmail-devel Subject: Re: CIA proposal (was: ClientInterface) X-MARC-Message: https://marc.info/?l=kmail-devel&m=105893534123286 On Monday 21 July 2003 21:31, Marc Mutz wrote: > Hi! > > > It's not watching the files that is the main problem. The problem > > is retaining the integrity of objects that point into files which > > can be changed by other processes. > > I was asked to provide a concept for a server-less solution to the > CIA (Concurrent Index Access) problem. Here is it. It's based on > the idea from LinuxTag that's described here as Step 3. It solves > the memory load problems with index files by having them shared > between KMail instances running on the same machine. > > Step 1: Require Maildir > Rationale: > - Changes in maildir folders can be efficiently monitored with > KDirWatch (which can use DNotify/FAM). In particular, changes > are monitored per-file. This is not the case for mbox. > - In particular, changes to stati result in renaming the file. > > Step 2: Observe that the Index file format allows most changes > (esp. all status updates) to be performed in-place. > > > Step 3: Don't create an in-memory representation of index entries > ("KMMsgInfo::kd") anymore. Instead, let every entry just be a > pointer to the beginning of it's mmaped region in the index file. > For this to be efficient, use Don's idea of storing those offsets > in a file of their own. Every access to information stored in the > index accesses the index file data directly. > Rationale: > - This allows most operations on exisiting messages to be > transparently shared across all KMail instances; and in an > extremely efficient way (memory-, cpu- and network-load-wise). This > holds esp. for the common operations of changing the various stati > of messages. > > Step 4: Define a protocol by which a given Kmail instance can > obtain write access to a given folder/index file. > > Sketch of the CIA/4 protocol: > > KMail instance A wants to obtain write access to folder F from B: > > 1. A tries to create folder-lockfile L_F > Success: Write own DCOP id into the lock file. > A has now (exclusive) write access to F. > 2. A tries to read a DCOP ID of B from L_F > Success: A sends B the message > "Please release folder F" > Wait a random amount of time, then goto 1. > Failure: Wait a random amount of time, then goto 1. > Repeated failure: Tell user. > > Observe how B is not required to close the folder or to unmmap the > index file. It marks the folder read-only internally (or remaps the > index to be ro). > > An instance holding the write lock is required to > > a. Release it as soon as practicable > b. Release it as soon as possible if the above DCOP message arrives > c. never alter existing index entries' length > > If a change in entry length is necessary, mark the old one as > deleted, rename the maildir file and add a new entry at the end. > Other instances will see a delete and add through the dirwatching > and can react accordingly. > > CIA/4 can be extended with a DCOP broadcast call to all KMail > instances to close F, so that A can perform (index file) compaction > after all instances replied with a "folder F closed" message. What I like: 1) "the Index file format allows most changes (esp. all status updates) to be performed in-place.", deletion is the only exception I think. I'm all for changing deletion so that an index entry is marked as deleted only and then later a cleanup operation, 'compaction' is performed. I'd prefer it if a timer was used to perform index file cleanup rather than waiting for exit, but that will add extra complexity, (I omit the details of that complexity). 2) The KMail instances are talking to each other via DCOP, so rather than a client only approach in my opinion it's a peer-to-peer network of servers approach. So I think there's some realization here that IPC is sensible. 3) I think this is a realistic suggestion for handling the problem of multiple clients accessing ~/Mail concurrently. Theoretically I think it might work, but I'm unsure about whether the mmap will hold up. Slightly pro: 1) 'Step 3: Don't create an in-memory representation of index entries ("KMMsgInfo::kd") anymore. Instead, let every entry just be a pointer to the beginning of it's mmaped region in the index file.' Well as every entry does have a pointer to the beginning of it's mmaped region I read this as simply drop the in-memory representation. I'm uncertain of the performance implications of this change, I guess they will be implementation specific. Some fields like the status field of a message are always read. This is because when KMail opens a folder it looks at the status of all messages to sync the count of unread messages and to find the next unread messages. (IIRC 1/3 of the time to show a folder is spent searching for the next unread message). So I think it makes sense to always cache the status field, given this for optimum performance it makes sense to cache it as the index file is read in sequentially, rather than later caching it by randomly accessing the file in some arbitrary order, (I hope that makes sense, I'm skipping over some details here). Or better yet if you are going to index the index file, please consider caching the status in the index of the index. There might also be performance implications to do with sorting and the caching of sort keys. Specifically kmheaders might be slow when a new message arrives or when the sort order is changed. But since the index file is now mmap'd, and therefore cached, I think it makes sense to experiment with dropping/reducing the in-memory representation. Neutral/Aside: 1) Be careful of the inbox folder, I think there is a design flaw in filtering. We currently use a inbox folder as a temporary storage location for incoming mail but I think it would make more sense to use a private folder. I say this as a private folder for incoming mail could be modified quickly without the need for every client to be informed of the changes. 2) I'm unsure how the index file mmap will hold up. If a client has to close the index file then i think the mmap will be lost, and I think losing the mmap and having no in-memory representation of index entries would be critically bad, performance wise. But if you can keep the mmap then effectively all the caching of index entries has been pushed into the operating system kernel via the mmap, with luck the os is designed in such a way that each client can share this ro mmap cache, and the operating system kernel itself is being used as the KMail server, which would be cool. I do wonder if multiple processes ro mmap the same file whether the memory used to back the mmap is shared by all those processes. My concerns: 1) This doesn't actually address the issue of external mail clients modifying ~/Mail does it? I mean not even for MailDir as this approach requires KMail to be running to detect changes in ~/Mail. 2) I'm concerned that this approach could be fragile. A client might die after deleting/inserting a Maildir message but before updating the index, or vice versa. Maybe this can be handled robustly so this is just a concern not a concrete criticism. But clearly this method does rely on mbox style locking rather than maildir style elimination of the need for locking. 3) If this approach works then it does address the criticism of having multiple clients modify index files but there's still the memory costs associated with this approach. Each client will need there own kmmsgdict, IIRC each entry in the kmmsgdict maps a sernum -> (folder , index ) which is an int -> (int, int) mapping or 12 bytes. I have >500K messages currently, so that's >6MB per client instance. And then there's the full text index. A full text index is by it's nature a large object as it's space/time tradeoff. For me it's > 24MB. Besides folder files there are also other files than need to be handled. The config file is a problem if it is desirable to have each client keep their config dialogs and general configuration info in sync. Another set of files is the list of mails that are kept on the server for each account. In summary: I like the idea of having all index file operations non-destructive, specifically this means making deletion merely mark an index entry as deleted and then later performing an index compaction. I also like the idea of experimenting with dropping the in-memory caching of index entries, I'd like to see profiling stats on doing that. I think both these tasks make sense to do even if a client/server approach is ultimately taken. I think this CIA proposal addresses (or at least attempts address) the key criticism I've made against the client only model. Which is having multiple clients work with the same index files. But I remain skeptical especially as to whether the mmap of index files can be retained, I think dropping both the mmap (or frequent re-mmaping) and dropping the in-memory cache would be a bad idea, (because of the performance implications). I still think overall all a client/server approach makes the most sense. This is because I think there is a demand for KMail to provide database like services to other (KDE) apps. Namely I'm thinking of services to track messages (KMMsgDict) and search messages (KMMsgIndex). Implementing these services efficiently requires using substantial amounts of memory, and it makes sense to concentrate the cost of that memory use in one process rather than duplicate it across multiple processes. In conclusion personally I'm ok with step 2 and are interested in the results of step 3 of this CIA proposal and think they can be performed in parallel with the Kernel/GUI separation of the client/server approach. Also perhaps similar to step 3 of the CIA proposal I hope to do more profiling work on my zero-copy display of messages idea (now that zero-copy parsing is implemented). I hope zero-copy display of messages will address what I expect to be the key bottle neck between the client/server, which is gross duplication of attachment data. Don. _______________________________________________ KMail Developers mailing list kmail@mail.kde.org http://mail.kde.org/mailman/listinfo/kmail