[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kmail-devel
Subject:    Bug#32526: kmail should have a "delete duplicate mail" option
From:       Malcolm <rannirl-kmail () otherkin ! net>
Date:       2001-09-28 14:29:59
[Download RAW message or body]


> Please provide an option that seeks out and destroys duplicate mail in each
> folder (based on the Subject field, following by a comparision of message
> bodies).

I know this has been closed, but I actually have an implementation for this. 
(I have to merge mailboxes fairly often, so I needed a solution myself).

Doing a subject then body compare is very inefficient however, epsecially on 
large folders (I have some with up to 10,000 messages in them which really 
shows up any slowness).

The method I used involves comparing the MD5s of the message ids (as those 
are acessible from KMMsgBase, the full message ids require you to actually 
load each message, not just the folder index). The MD5s are not garanteed to 
be unique (I've found cases where they are not just on my simple tests) so I 
use the timestamps to remove false positives (the chances of two messages 
having the same MD5 -and- precisely the same timestamp and not being 
duplicates seems near enough to non-existant).

It's not perfect yet (mostly because I'm using a QDict for comparison speed 
and if two keys hash to the same value you only get one of them returned), 
but it does hit the vast majority of duplicates without any false positives.

If anyone is interested, I can supply a patch for it. (against the 1.3.1 
release version currently).


-- 
"And what are you?" "Alive. Everything else is negotiable."
- Sheridan and Franklin, Babylon 5
_______________________________________________
Kmail Developers mailing list
Kmail@mail.kde.org
http://mail.kde.org/mailman/listinfo/kmail

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic