'[Bug 307585] Sorting is too cpu intensive for big folders'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-bugs-dist
Subject:    [Bug 307585] Sorting is too cpu intensive for big folders
From:       Frank Reininghaus <frank78ac () googlemail ! com>
Date:       2012-09-30 9:15:35
Message-ID: bug-307585-17878-R0nE4A3mD1 () http ! bugs ! kde ! org/
[Download RAW message or body]

https://bugs.kde.org/show_bug.cgi?id=307585

--- Comment #2 from Frank Reininghaus <frank78ac@googlemail.com> ---
Thanks for your analysis, Mark!

(In reply to comment #0)
> The current sorting algorithm can be found in kfileitemmodel.cpp in the
> function: KFileItemModel::insertItems.
> 
> In there it does something quite inefficient. It has 2 loops and has 2 lists.
> 1. the list of all items
> 2. the list of the resulting sorted items
> 
> In pseudo code it's doing something along the following lines:
> 
> - Loop over all items (list 1)
> - - per item loop over the second list (list 2) to find the last inserted
> item. No, this does not have the be the "last" item in the list. If it finds
> the last index it adds the current item after that.
> 
> This is very intensive as the second list grows with every item added thus
> the inner loop gets bigger and bigger to iterate through.

I think that you refer to the loop

while (sourceIndex < sortedItems.count()) {
  ...
}

The run time of this part of the code should ideally be O(N), where N is the
maximum of the number of the items that are inserted and the number of items
that are in the model already.

But I can see that even then there is a problem if the 100000 items are
inserted not in one go, but in many small groups. Then we have to run the
insertItems() function O(N) times and end up with O(N^2) run time behaviour,
which is very bad.

The first part of the loop,

        while (targetIndex < m_itemData.count()) {
            if (!lessThan(m_itemData.at(targetIndex),
sortedItems.at(sourceIndex))) {
                break;
            }
            ++targetIndex;
        }

could be replaced by a binary search, eliminating one source of O(N) behaviour.
There is still the adjustment of the indices in m_items after the main loop,
which we can't do much about, I'm afraid. But then there is the following line
inside the loop:

m_itemData.insert(targetIndex, sortedItems.at(sourceIndex));

This could actually be the worst part of it, because insertion in lists is
O(N), and this can happen O(N) times in a single call of the function which is
possibly called many times. To change this, one would not insert into
m_itemData directly, but rather merge the two lists sortedItems and m_itemData
into a single new list (which has reserved space for both lists combined). We
would only have to *append*, which is O(1), rather than *insert*, which is
O(N), then.

> This sorting seems really odd and inefficient and could benefit greatly if
> done differently. That probably does require changing quite a bit of code in
> the kitemmodel.cpp class thus is no small task.

If my assumption about the bottleneck is right, it should be doable. I'll
attach a patch in a minute.

> Also, that isn't the only sorting done. A
> "KFileItemModelSortAlgorithm::sort" is also done and i have yet to figure
> out what it does exactly. That one also shows up very high in profiling
> output.

This function does the pre-sorting of the items to be inserted in insertItems()
and also performs the re-sorting if the sort role or sort order is changed.

-- 
You are receiving this mail because:
You are watching all bug changes.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic