From kde-pim Wed Dec 26 15:44:53 2012 From: Christian Mollekopf Date: Wed, 26 Dec 2012 15:44:53 +0000 To: kde-pim Subject: Re: [Kde-pim] Nepomukfeeder updates almost ready Message-Id: <2956891.9jqFOr5fc8 () myhost2> X-MARC-Message: https://marc.info/?l=kde-pim&m=135653672101531 Hey, I made another bunch of fixes, turned the finding of skipped items into a recurring task, and turn the change-recorder off now if the feeder is disabled entirely. In my testing so far this system behaves much better than what we used to have. I plan on committing this to 4.10 if noone objects within the next days. (I'll write a mail to release-team first). The code is here: http://quickgit.kde.org/?p=clones%2Fkdepim- runtime%2Fcmollekopf%2FpimRuntimeClone.git&a=shortlog&h=c2ca91566953c57af119634f65b5bd73bac7e7fa Cheers, Christian On Sunday 23 December 2012 17.54:18 Christian Mollekopf wrote: > Heya, > > To cut right to the chase; I revamped the feeders a bit, think it's much > better than what we had before, and would like to get it into 4.10. So feel > free to skip if you don't care. > > I moved to a recurring, query based approach for the initial-indexing. That > means, instead of doing a single initial-indexing when the feeder is > executed the first time, and relying purely on updates from the > change-recorder afterwards, the initial-indexing is now more a maintenance > task (which is currently running on every start), and queries for all not > yet indexed items. > > That is necessary, as the initial assumption that we can index items faster > than notifications come in didn't hold true, which resulted in the feeder > regularly being overloaded with stuff to index. > > The initial query approach resulted in n queries for n items, which is way > too slow to be feasible for all items (it is taking ages, literally). The > only alternative approach I found is; we run two queries, one in akonadi > and one in nepomuk, each querying for *all* available items. Comparing the > two lists, results in the list of items which have not been indexed yet. Of > course, that misses any changes on items which have been indexed before, > but have been modified since then, so it's not ideal either. > These queries are fairly efficient as they result in a single sql query per > db (as opposed to n), although with a huge result set. I could query my db > of ~100'000 items in ~20s (i7 processor). > > Since I figured changes on emails, which are mostly just flags, are > negligible, I switched the email initial-indexing to that new approach. > > Non-email items continue to be indexed as usual, meaning there is one query > per item, which allows us to detect modifications as well. That is slow as > usual, but since we usually have a lot more email items than non-email > items, it works well enough. > > Another important advantage is that we can thus now also skip large batches > of new/changed items, knowing they will be picked up by the > initial-indexing eventually. That also allows us to turn off the > change-recorder when the feeder is turned off (which is another problem if > we rely on the change- recorder too much). > > One remaining problem is that we get loads of notifications of changed/added > items, which I think are mostly due to sync-on-demand updates, updating the > cache (and not actual new emails or whatnot). I also often get flag change > notifications on my offline imap accounts, which I don't really know why > yet. That of course would lead to loads of items being indexed over and > over again, but that can be mitigated somewhat since we now can skip larger > batches of items. > > Besides I made some performance improvements, such as the cache I mentioned > previously (200% performance boost), or that new items are now indexed > without any queries, which gives another boost of 10%-20% or so. > > Overall, I think we should get this into 4.10 as fast as possible. The patch > is somewhat large (and way to late in the process), but IMO the previous > feeders are broken enough to justify this. So what do you think? Should I > commit this to 4.10 in a couple of commits, or only master and then > backport it for 4.10.1? > > Cheers, > Christian > _______________________________________________ > KDE PIM mailing list kde-pim@kde.org > https://mail.kde.org/mailman/listinfo/kde-pim > KDE PIM home page at http://pim.kde.org/ _______________________________________________ KDE PIM mailing list kde-pim@kde.org https://mail.kde.org/mailman/listinfo/kde-pim KDE PIM home page at http://pim.kde.org/