This is an automatically generated e-mail. To reply, visit: https://git.reviewboard.kde.org/r/118231/

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/collectionindexingjob.h (Diff revision 1)
66
    bool m_indexedUnindexed;
Whoa. This variable name is very very confusing. It's true, if there are indexed items which are unindexed? :O
Right, it's a lock to not repeatedly index not-yet-indexed items. It's necessary so we don't end up in an endless loop should we fail to index some items (for whatever reason).
I'll think of a better name.

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/collectionindexingjob.cpp (Diff revision 1)
200
    if (!m_indexedItems.isEmpty()) {
I'm sorry. I'm a little confused as to how this could ever happen unless Akonadi is messed up and does not reliably send item removed notifications?
The whole point of this codepath is to recover when we miss something. There is no guarantee that we manage to process all signals (the akonadi server just fires a dbus signal and hopes for the best), so it's entirely possible that we end up in this state (and it has in fact happened to me). And if you clear the cache (for whatever reason), you definitely end up in this state.

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/index.h (Diff revision 1)
33
class Index : public QObject
I'm not too happy with this classes name. How about "Indexer" instead? But then it would clash with "AbstractIndexer".
I'm open to other suggestions, but IMO Index is quite apt. The Index class encapsulates the "index" we're working against (which is actually consisting of many small indexes, but that's hidden). Indexer is imo suitable for the code we have that processes an item/collection. The idea with the "index" is that you "put an item in the index", and what internally happens or where exactly this information ends up is irrelevant. Or the other way around, you query the index (not the indexer).
So, what's wrong with Index? ;-)

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/index.cpp (Diff revision 1)
128
void Index::reindex(const Akonadi::Item& item)
Is this required? From a Xapian point of view you're spending an extra amount of time first removing the data and then adding it back again.

If you just index the item, Xapian will internally do a diff on the terms that have changed, and then just update those.
The original code was like that AFAIK so I kept it that way, but I'll change it then.

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/scheduler.h (Diff revision 1)
66
    QMap<Akonadi::Collection::Id, QSharedPointer<QQueue<Akonadi::Item::Id> > > m_queues;
This is confusing. Perhaps some more documentation?

Also, couldn't you just directly do a

QMap<Akonadi::Collection::ID, QQueue<Akonadi::Item::ID>>

I'm not sure what the shared pointer is doing.
I think I overlooked the [] operator that allows me to access the queue by reference, and I wasn't sure whether QMap copies the values internally, but I suppose it shouldn't.
The shared pointer is just for automatic lifetime management (no leaks guaranteed).

I suppose I can change that to just use a QQueue as you suggested.

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/scheduler.cpp (Diff revision 1)
36
    m_processTimer.setInterval(100);
Why 100?
Random value for event compression.

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/scheduler.cpp (Diff revision 1)
46
        scheduleCollection(Akonadi::Collection(col), true);
This is quite dangerous. If an email is not indexed it results in the entire collection being sync and all a full collection fetch job going on.

I'm very much against this. I get enough angry emails about how the baloo indexer is sucking all their cpu.
Either this or you only have half your emails indexed.
Note that this codepath is only triggered if we fail to index items in a timely fashion, and since indexing is fairly fast that should seldomly be the case, right?

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/scheduler.cpp (Diff revision 1)
53
    group.writeEntry("initialIndexing", true);
So, even if someone switched off their system before the initial indexing was done, we mark the initial indexing as completed?
The not-yet completed collections are remembered as part of the "dirty collections" which should take care of this.

On May 22nd, 2014, 4 p.m. UTC, Vishesh Handa wrote:

src/pim/agent/scheduler.cpp (Diff revision 1)
97
void Scheduler::addItem(const Akonadi::Item &item)
So each time an item is added, you pass the parent collection to the CollectionIndexingJob and there you fetch the entire collection (+statistics) and do a query on Xapian to check how many items are already indexed?

Arguably, the xapian query would be quite fast, but still.
That could be optimized as we only require up-to date statistics for a full sync, but I really doubt that fetching a collection will have any noticeable performance impact. 

- Christian


On May 20th, 2014, 10:22 p.m. UTC, Christian Mollekopf wrote:

Review request for Baloo and Vishesh Handa.
By Christian Mollekopf.

Updated May 20, 2014, 10:22 p.m.

Repository: baloo

Description

A scheduler for baloo: 
* delays the indexing until no new item has been added for at least 5 seconds to avoid indexing during a collection sync.
* remembers if it failed to index something and triggers recovery path on next start.
* supports manual triggering of recovery path if required.

Testing

I'm running it for a while, and it reduced the stress that baloo imposed on my system and all my mails are indexed since I'm using it (wasn't the case before).

Diffs

  • src/pim/agent/CMakeLists.txt (e917915a3414738595caea5497859ef4810ec44c)
  • src/pim/agent/agent.h (1dbf0fc0a16d0615dbfa4878706359bb687facd0)
  • src/pim/agent/agent.cpp (8904d49d3579b58b634d2570fbcc8007e5ee41ed)
  • src/pim/agent/collectionindexingjob.h (PRE-CREATION)
  • src/pim/agent/collectionindexingjob.cpp (PRE-CREATION)
  • src/pim/agent/index.h (PRE-CREATION)
  • src/pim/agent/index.cpp (PRE-CREATION)
  • src/pim/agent/scheduler.h (PRE-CREATION)
  • src/pim/agent/scheduler.cpp (PRE-CREATION)

View Diff