From kfm-devel  Tue Oct 23 12:38:43 2012
From: Mark <markg85 () gmail ! com>
Date: Tue, 23 Oct 2012 12:38:43 +0000
To: kfm-devel
Subject: Ideas for improving sorting. Goal: under 1 second for 200.000 files.
Message-Id: <CAPd6JnEjHCYK+qyZtVSmrRFqUo5iyb3k+f2L8k3BYnPc0jM==A () mail ! gmail ! com>
X-MARC-Message: https://marc.info/?l=kfm-devel&m=135099596318315

Hi,

Right now we have 2 bug reports for sorting and suggestions/patches on
how to improve them. [1, 2]

I tried a lot already to improve sorting and made it twice as fast in
my local experiments [2]. The patch for that is certainly unacceptable
and very very hacky! Yesterday i just sat down (had to since i was
having a bus ride anyway :p) and just think about the actual issue and
how it might be resolved.

Doing that i came to a whole different conclusion then before. It's
only theoretical at the moment, but i think this might actually work
and greatly speed things up.

1. Splitting out sorting for folders and the rest
-------------------
This one seemingly simple makes a noticeable dent in the time it takes
to sort. The lessThan function [3] (line 1320) is called for the
sorting algorithm in O(n log n). Optimizing anything in here will
result in a faster sort. File and Folder sorting is split out in
"specialized" lessThan functions (lessThanFolder and lessThanFiles)
then you can already prevent the "if (m_sortDirsFirst || m_sortRole ==
SizeRole) {" construction. I've tried this and it speeds up the
sortingquite a bit. It still remains slow though.

2. Copy KStringHandler::naturalCompare into dolphin for further in
depth optimizations
2.1 Unicode cache for isNull, isPunct, isSpace and is isDigit.
-------------------
KStrinHandler::naturalCompare is - by far - the biggest resource user.
It needs improvements and thus far i simply didn't know how to do
that. Yesterday i figured out a bunch of optimizations for this. What
i;d do is make a structure (class):
class CharUnicodeCache
{
public:
    bool isSpace;
    bool isPunct;
    bool isNull;
    bool isDigit;
}

Putting that in a list: QList<CharUnicodeCache> m_chatUnicodeCache;
Then in the naturalCompare it needs to fetch the cached values from
the list rather then using the function calls. I'm not quite sure
about this optimization since it's quite tricky and "might" not
improve performance that much. Note: the key is the numeric
representation of the hex value of the unicode character thus the
lookup is indeed in constant time and should be very rapid.

2.2 Get rid of localeAwareCompare - kind of
-------------------
localeAwareCompare is - by far - the biggest resource user in
naturalCompare so one way of making it a lot faster is by getting rid
of it all together :)  Sadly that's not entirely possible, but we can
use the stuff that Qt uses to make it faster and prevent string
allocations.
To follow this, here is the stack when localeAwareCompare is called:
- localeAwareCompare
- - localeAwareCompare_helper
- - - ... system dependent stuff. if nothing is found: ucstrcmp Lets
assume we go through ucstrcmp [4]
- - - - ucstrncmp [5]

The deepest functions for localeAwareCompare are ucstrcmp and
ucstrncmp. We can simply copy those 2 functions and make "our own"
localeAwareCompare that just passes around QChar pointers without
allocating anything. Doing this is likely to greatly speedup the new
"localeAwareCompare".

Note: the naturalCompare has a comment that complains about
insensitive comparison and having to do toLower to get around that. If
we copy the Qt functions we might as well copy the case insensitive
version [6] and [7] which gets rig of the toLower which is also quite
high on the function call cost list.

2.3 Pass around QChar* and get rid of QString
-------------------
Right now naturalCompare takes two QString arguments while it
internally grabs a QChar* from the string. I'm not sure here, but i
think you save some time if you prevent QString passing around and
simply pass around QChar* then i imagine that you have a lot less
unicode since the string arrays are already in unicode. That might be
another (small?) speed improvement i guess.

3. Multithreaded sorting
-------------------
I don't know if the current sorting algorithm is fit for
multithreading usage, but the issue we have in sorting is CPU bound.
If we add in threading it should speed up significantly.

Right now i'm implementing 2.1 in a local branch and 2.2 is next on my
list. I expect 2.2 to give the biggest speedboost.
I will report my findings in this thread once i have something.

It would be best if this sorting is greatly improved and added to
Dolphin's codebase. The above suggestions do add quite a bit of code
to dolphin, but it's all fairly self contained. I hope it's also low
maintenance, but that remains to be seen :)

If someone else has more suggestions/ideas on improving sorting, please do tell.

Having sorting done under 1 second on my CPU (AMD 1100XT) is my goal
for now. But i think it can be even faster then that.

[1] https://bugs.kde.org/show_bug.cgi?id=306290
[2] https://bugs.kde.org/show_bug.cgi?id=307585
[3] http://quickgit.kde.org/index.php?p=kde-baseapps.git&a=blob&h=61f512a8e5505a110a351525ee6592e8b2093f96&hb=b4c01c00a91bc1a206c9577adca2462845699ff9&f=dolphin%2Fsrc%2Fkitemviews%2Fkfileitemmodel.cpp
[4] http://qt.gitorious.org/qt/qtbase/blobs/master/src/corelib/tools/qstring.cpp#line202
[5] http://qt.gitorious.org/qt/qtbase/blobs/master/src/corelib/tools/qstring.cpp#line192
[6] http://qt.gitorious.org/qt/qtbase/blobs/master/src/corelib/tools/qstring.cpp#line128
[7] http://qt.gitorious.org/qt/qtbase/blobs/master/src/corelib/tools/qstring.cpp#line212