[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kdepim-users
Subject:    Re: [kdepim-users] Tbird versus Kmail, performance
From:       Daniel =?ISO-8859-1?Q?Vr=E1til?= <dvratil () redhat ! com>
Date:       2014-11-14 14:37:41
Message-ID: 1952670.RfFaRKgg8D () thor
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


On Friday 14 of November 2014 09:12:12 Pablo Sanchez wrote:
> [ Comments below, in-line ]
> 
> On 11/14/2014 06:36 AM, René J.V. Bertin wrote:
> > [ trimmed ]
> > 
> > Which configures there isn't much IO going on during these lengthy
> > operations.
> 
> Yes, in my (VM) tests, after ensuring MySQL's logging was disabled,
> and increase MySQL's memory to effectively cache the /akonadi/ tables,
> I found I was CPU bound with one processor.
> 
> My host O/S has eight (logical) CPU's so I gave the VM an additional
> four CPU's.  MySQL ate those CPU's too.  Baloo was eating some CPU but
> it is nice'd so not that big of a deal.
> 
> I dug in to MySQL to see what it was doing.  I didn't do anything very
> 
> complicated to measure what SQL was running.  I repeatedly ran:
>    > show full processlist;
> 
> This shows the database connections and if they're running SQL, the
> SQL they're running.
> 
> From multiple connections, I'd see the following queries being run:
> 
>    SELECT count(DISTINCT PimItemTable.id)
>    FROM   PimItemTable
>           INNER JOIN PimItemFlagRelation
>           ON ( PimItemTable.id = PimItemFlagRelation.PimItem_id )
>    WHERE ( PimItemTable.collectionId = ?
>    AND   ( PimItemFlagRelation.Flag_id = ?
>                       OR
>            PimItemFlagRelation.Flag_id = ? ) )
> 
> and
> 
>    SELECT count(id), sum(size)
>    FROM   PimItemTable
>    WHERE  collectionId = ?;

IIRC this provides collections statistics (total messages, unread messages, 
...) - it's calculated quite often, as almost any Item operation can affect 
the statistics, and unfortunately it's not cached, so when something changes, 
all clients will request update on the collection stats, which means it's 
actually calculated again for each client. Definitely something worth looking 
into.

> I looked at the data distribution for both /collectionId/ and
> /Flag_id/ and there weren't many distinct values.
> 
> I tried to /cheat/ and create some /covering/ indexes to get slightly
> better performance but it was marginal.  This index helps the second
> query:
> 
>    CREATE INDEX idx01 ON PimItemTable (collectionId, id, size);

Do you have any numbers? We could probably add the index temporarily, and 
remove it once the caching and sharing of collection statistics is 
implemented, which should reduce the amount of queries to the point where the 
index does not actually matter that much.

> 
> I circled back to the queries and realized they're somewhat
> unbounded.  I'd have expected the query to use a `date/time' column to
> further restrict it.

Well, they have to be, as stats are always computed from the entire 
collection, not just part of it :-)

> For example, if we're checking for new data between `time x' and `time
> x + 1', we should ask the DBMS to only return the difference.
> Otherwise we're forcing the DBMS to do a lot of work, which in this
> case, translates itself to high CPU utilization (since all the data is
> in cache).
> 
> The rule is to limit the amount of data pulled from the DB.  This way
> the DB has less `data to traverse' .. which will likely translate to
> better scalability.

We only use the datetime limit for Baloo indexer, as the timestamp stored in 
the Akonadi database does not represent the actual timestamp of when the item 
was retrieved from remote backend - due to all the async stuff, queueing and 
delayed processing, the timestamp stored in database is usually "newer" than 
the actual revision of the remote server that we checked out. Using the 
timestamp would cause us to miss data on next sync that arrived between 
"actual last sync timestamp" and "timestamp of first item synced in last 
sync".

Most remote backends use different methods for incremental sync, on IMAP 
that's CONDSTORE/MODSEQ (RFC4551), which is much more reliable than 
timestamps.

As of Akonadi 1.13 and KDE PIM 4.14 we don't even fetch all Items from Akonadi 
into ItemSync (we used to do that so that ItemSync could compare the new Items 
with the existing ones and decide which Items are new, which are changed and 
which were removed from the remote server) - as of 4.14 the logic has moved to 
Akonadi Server, which can perform this much more effectively without having to 
checkout the entire database first.

> Carrying the above through the first example, we might do something
> 
> like this in the WHERE-clause:
>    ::: Time 0 :::
>    --
>    -- DB Client sets /min_timestamp/ to some low value (e.g.
>    -- 1970 00:00:00)
>    --
>    WHERE ...
>    AND   some-timestamp BETWEEN min_timestamp AND curr_timestamp
> 
>    ::: Time 1 :::
>    --
>    -- The DB client knows the last time it checked (curr_timestamp)
>    -- and that becomes /last_time_we_checked/ and it re-computes
>    -- curr_timestamp.
>    --
>    WHERE ...
>    AND   some-timestamp BETWEEN last_time_we_checked AND curr_timestamp
> 
> etc.
> 
> Cheers,
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph:    819.459.1926         Blog:  http://pablo-blog.blueoakdb.com
> iNum:  883.5100.0990.1054
> 
> _______________________________________________
> KDE PIM users mailing list
> Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users

-- 
Daniel Vrátil | dvratil@redhat.com | dvratil on #kde-devel, #kontact, #akonadi
Software Engineer - KDE Desktop Team, Red Hat Inc.

GPG Key: 0xC59D614F6F4AE348
Fingerprint: 4EC1 86E3 C54E 0B39 5FDD B5FB C59D 614F 6F4A E348
["signature.asc" (application/pgp-signature)]

_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic