[prev in list] [next in list] [prev in thread] [next in thread]
List: kmail-devel
Subject: Re: CIA proposal (was: ClientInterface)
From: Don Sanders <sanders () kde ! org>
Date: 2003-07-23 4:58:10
[Download RAW message or body]
On Monday 21 July 2003 21:31, Marc Mutz wrote:
> Hi!
>
> > It's not watching the files that is the main problem. The problem
> > is retaining the integrity of objects that point into files which
> > can be changed by other processes.
>
> I was asked to provide a concept for a server-less solution to the
> CIA (Concurrent Index Access) problem. Here is it. It's based on
> the idea from LinuxTag that's described here as Step 3. It solves
> the memory load problems with index files by having them shared
> between KMail instances running on the same machine.
>
> Step 1: Require Maildir
> Rationale:
> - Changes in maildir folders can be efficiently monitored with
> KDirWatch (which can use DNotify/FAM). In particular, changes
> are monitored per-file. This is not the case for mbox.
> - In particular, changes to stati result in renaming the file.
>
> Step 2: Observe that the Index file format allows most changes
> (esp. all status updates) to be performed in-place.
>
>
> Step 3: Don't create an in-memory representation of index entries
> ("KMMsgInfo::kd") anymore. Instead, let every entry just be a
> pointer to the beginning of it's mmaped region in the index file.
> For this to be efficient, use Don's idea of storing those offsets
> in a file of their own. Every access to information stored in the
> index accesses the index file data directly.
> Rationale:
> - This allows most operations on exisiting messages to be
> transparently shared across all KMail instances; and in an
> extremely efficient way (memory-, cpu- and network-load-wise). This
> holds esp. for the common operations of changing the various stati
> of messages.
>
> Step 4: Define a protocol by which a given Kmail instance can
> obtain write access to a given folder/index file.
>
> Sketch of the CIA/4 protocol:
>
> KMail instance A wants to obtain write access to folder F from B:
>
> 1. A tries to create folder-lockfile L_F
> Success: Write own DCOP id into the lock file.
> A has now (exclusive) write access to F.
> 2. A tries to read a DCOP ID of B from L_F
> Success: A sends B the message
> "Please release folder F"
> Wait a random amount of time, then goto 1.
> Failure: Wait a random amount of time, then goto 1.
> Repeated failure: Tell user.
>
> Observe how B is not required to close the folder or to unmmap the
> index file. It marks the folder read-only internally (or remaps the
> index to be ro).
>
> An instance holding the write lock is required to
>
> a. Release it as soon as practicable
> b. Release it as soon as possible if the above DCOP message arrives
> c. never alter existing index entries' length
>
> If a change in entry length is necessary, mark the old one as
> deleted, rename the maildir file and add a new entry at the end.
> Other instances will see a delete and add through the dirwatching
> and can react accordingly.
>
> CIA/4 can be extended with a DCOP broadcast call to all KMail
> instances to close F, so that A can perform (index file) compaction
> after all instances replied with a "folder F closed" message.
What I like:
1) "the Index file format allows most changes (esp. all status
updates) to be performed in-place.", deletion is the only exception I
think. I'm all for changing deletion so that an index entry is marked
as deleted only and then later a cleanup operation, 'compaction' is
performed.
I'd prefer it if a timer was used to perform index file cleanup rather
than waiting for exit, but that will add extra complexity, (I omit
the details of that complexity).
2) The KMail instances are talking to each other via DCOP, so rather
than a client only approach in my opinion it's a peer-to-peer network
of servers approach. So I think there's some realization here that
IPC is sensible.
3) I think this is a realistic suggestion for handling the problem of
multiple clients accessing ~/Mail concurrently. Theoretically I think
it might work, but I'm unsure about whether the mmap will hold up.
Slightly pro:
1) 'Step 3: Don't create an in-memory representation of index entries
("KMMsgInfo::kd") anymore. Instead, let every entry just be a
pointer to the beginning of it's mmaped region in the index
file.'
Well as every entry does have a pointer to the beginning of it's
mmaped region I read this as simply drop the in-memory
representation.
I'm uncertain of the performance implications of this change, I guess
they will be implementation specific. Some fields like the status
field of a message are always read. This is because when KMail opens
a folder it looks at the status of all messages to sync the count of
unread messages and to find the next unread messages. (IIRC 1/3 of
the time to show a folder is spent searching for the next unread
message).
So I think it makes sense to always cache the status field, given this
for optimum performance it makes sense to cache it as the index file
is read in sequentially, rather than later caching it by randomly
accessing the file in some arbitrary order, (I hope that makes sense,
I'm skipping over some details here).
Or better yet if you are going to index the index file, please
consider caching the status in the index of the index.
There might also be performance implications to do with sorting and
the caching of sort keys. Specifically kmheaders might be slow when a
new message arrives or when the sort order is changed.
But since the index file is now mmap'd, and therefore cached, I think
it makes sense to experiment with dropping/reducing the in-memory
representation.
Neutral/Aside:
1) Be careful of the inbox folder, I think there is a design flaw in
filtering. We currently use a inbox folder as a temporary storage
location for incoming mail but I think it would make more sense to
use a private folder. I say this as a private folder for incoming
mail could be modified quickly without the need for every client to
be informed of the changes.
2) I'm unsure how the index file mmap will hold up. If a client has to
close the index file then i think the mmap will be lost, and I think
losing the mmap and having no in-memory representation of index
entries would be critically bad, performance wise. But if you can
keep the mmap then effectively all the caching of index entries has
been pushed into the operating system kernel via the mmap, with luck
the os is designed in such a way that each client can share this ro
mmap cache, and the operating system kernel itself is being used as
the KMail server, which would be cool.
I do wonder if multiple processes ro mmap the same file whether the
memory used to back the mmap is shared by all those processes.
My concerns:
1) This doesn't actually address the issue of external mail clients
modifying ~/Mail does it? I mean not even for MailDir as this
approach requires KMail to be running to detect changes in ~/Mail.
2) I'm concerned that this approach could be fragile. A client might
die after deleting/inserting a Maildir message but before updating
the index, or vice versa. Maybe this can be handled robustly so this
is just a concern not a concrete criticism. But clearly this method
does rely on mbox style locking rather than maildir style elimination
of the need for locking.
3) If this approach works then it does address the criticism of having
multiple clients modify index files but there's still the memory
costs associated with this approach. Each client will need there own
kmmsgdict, IIRC each entry in the kmmsgdict maps a
sernum -> (folder , index ) which is an int -> (int, int) mapping or
12 bytes. I have >500K messages currently, so that's >6MB per client
instance.
And then there's the full text index. A full text index is by it's
nature a large object as it's space/time tradeoff. For me it's >
24MB.
Besides folder files there are also other files than need to be
handled. The config file is a problem if it is desirable to have each
client keep their config dialogs and general configuration info in
sync.
Another set of files is the list of mails that are kept on the server
for each account.
In summary:
I like the idea of having all index file operations non-destructive,
specifically this means making deletion merely mark an index entry as
deleted and then later performing an index compaction. I also like
the idea of experimenting with dropping the in-memory caching of
index entries, I'd like to see profiling stats on doing that. I think
both these tasks make sense to do even if a client/server approach is
ultimately taken.
I think this CIA proposal addresses (or at least attempts address) the
key criticism I've made against the client only model. Which is
having multiple clients work with the same index files. But I remain
skeptical especially as to whether the mmap of index files can be
retained, I think dropping both the mmap (or frequent re-mmaping) and
dropping the in-memory cache would be a bad idea, (because of the
performance implications).
I still think overall all a client/server approach makes the most
sense. This is because I think there is a demand for KMail to provide
database like services to other (KDE) apps. Namely I'm thinking of
services to track messages (KMMsgDict) and search messages
(KMMsgIndex). Implementing these services efficiently requires using
substantial amounts of memory, and it makes sense to concentrate the
cost of that memory use in one process rather than duplicate it
across multiple processes.
In conclusion personally I'm ok with step 2 and are interested in the
results of step 3 of this CIA proposal and think they can be
performed in parallel with the Kernel/GUI separation of the
client/server approach. Also perhaps similar to step 3 of the CIA
proposal I hope to do more profiling work on my zero-copy display of
messages idea (now that zero-copy parsing is implemented). I hope
zero-copy display of messages will address what I expect to be the
key bottle neck between the client/server, which is gross duplication
of attachment data.
Don.
_______________________________________________
KMail Developers mailing list
kmail@mail.kde.org
http://mail.kde.org/mailman/listinfo/kmail
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic