From kde-pim Thu Dec 27 21:06:01 2012 From: Allen Winter Date: Thu, 27 Dec 2012 21:06:01 +0000 To: kde-pim Subject: Re: [Kde-pim] Nepomukfeeder updates almost ready Message-Id: <2076780.V3G6syO0Sk () dizzy> X-MARC-Message: https://marc.info/?l=kde-pim&m=135664239631293 On Wednesday 26 December 2012 04:44:53 PM Christian Mollekopf wrote: > Hey, > > I made another bunch of fixes, turned the finding of skipped items into a > recurring task, and turn the change-recorder off now if the feeder is disabled > entirely. In my testing so far this system behaves much better than what we > used to have. > > I plan on committing this to 4.10 if noone objects within the next days. (I'll > write a mail to release-team first). > > The code is here: > http://quickgit.kde.org/?p=clones%2Fkdepim- > runtime%2Fcmollekopf%2FpimRuntimeClone.git&a=shortlog&h=c2ca91566953c57af119634f65b5bd73bac7e7fa > > Cheers, > Christian > > > On Sunday 23 December 2012 17.54:18 Christian Mollekopf wrote: > > Heya, > > > > To cut right to the chase; I revamped the feeders a bit, think it's much > > better than what we had before, and would like to get it into 4.10. So feel > > free to skip if you don't care. > > > > I moved to a recurring, query based approach for the initial-indexing. That > > means, instead of doing a single initial-indexing when the feeder is > > executed the first time, and relying purely on updates from the > > change-recorder afterwards, the initial-indexing is now more a maintenance > > task (which is currently running on every start), and queries for all not > > yet indexed items. > > > > That is necessary, as the initial assumption that we can index items faster > > than notifications come in didn't hold true, which resulted in the feeder > > regularly being overloaded with stuff to index. > > > > The initial query approach resulted in n queries for n items, which is way > > too slow to be feasible for all items (it is taking ages, literally). The > > only alternative approach I found is; we run two queries, one in akonadi > > and one in nepomuk, each querying for *all* available items. Comparing the > > two lists, results in the list of items which have not been indexed yet. Of > > course, that misses any changes on items which have been indexed before, > > but have been modified since then, so it's not ideal either. > > These queries are fairly efficient as they result in a single sql query per > > db (as opposed to n), although with a huge result set. I could query my db > > of ~100'000 items in ~20s (i7 processor). > > > > Since I figured changes on emails, which are mostly just flags, are > > negligible, I switched the email initial-indexing to that new approach. > > > > Non-email items continue to be indexed as usual, meaning there is one query > > per item, which allows us to detect modifications as well. That is slow as > > usual, but since we usually have a lot more email items than non-email > > items, it works well enough. > > > > Another important advantage is that we can thus now also skip large batches > > of new/changed items, knowing they will be picked up by the > > initial-indexing eventually. That also allows us to turn off the > > change-recorder when the feeder is turned off (which is another problem if > > we rely on the change- recorder too much). > > > > One remaining problem is that we get loads of notifications of changed/added > > items, which I think are mostly due to sync-on-demand updates, updating the > > cache (and not actual new emails or whatnot). I also often get flag change > > notifications on my offline imap accounts, which I don't really know why > > yet. That of course would lead to loads of items being indexed over and > > over again, but that can be mitigated somewhat since we now can skip larger > > batches of items. > > > > Besides I made some performance improvements, such as the cache I mentioned > > previously (200% performance boost), or that new items are now indexed > > without any queries, which gives another boost of 10%-20% or so. > > > > Overall, I think we should get this into 4.10 as fast as possible. The patch > > is somewhat large (and way to late in the process), but IMO the previous > > feeders are broken enough to justify this. So what do you think? Should I > > commit this to 4.10 in a couple of commits, or only master and then > > backport it for 4.10.1? > > Are there any objections to getting this work committed for 4.10? It's awfully late in the release cycle to be pushing for this, but I will do so if I get warm-fuzzies from a couple more folks that we need it. Anyone want to chime in here? Please do so ASAP. _______________________________________________ KDE PIM mailing list kde-pim@kde.org https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/