[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: Review Request 115432: Only initialize the hash m_items in KFileItemModel if it is needed
From: "Frank Reininghaus" <frank78ac () googlemail ! com>
Date: 2014-02-03 9:50:02
Message-ID: 20140203095002.18034.33319 () probe ! kde ! org
[Download RAW message or body]
> On Feb. 2, 2014, 8 p.m., Mark Gaiser wrote:
> > Hi Frank,
> >
> > You're adding quite a bit of complexity.
> >
> > Is it an option to just remove m_items and change the storage of m_itemData? That \
> > is already one big list with all data in it. If you change that to a QHash<KUrl, \
> > KFileItem> and drop m_items you reduce quite a bit it seems. It will certainly be \
> > a bit bigger in memory, but you get rid of m_data completely so there might be no \
> > tradeoff memory wise (right?).
> > In terms of speed.. I don't know which one will be faster. I'm guessing a QList \
> > wins there.
>
> Frank Reininghaus wrote:
> Hi Mark, thanks for your comment. I'm not sure if I fully understand what you mean \
> - a QHash<KUrl, KFileItem> is an unordered container, but the files have to appear \
> in the view in a well-defined order, and we also need index-based access. We would \
> lose that if we only used the hash for storage, and then we woudl have to find some \
> other way to establish the "index -> KFileItem" mapping.
> Mark Gaiser wrote:
> Hi Frank,
>
> Ahh, i didn't think about that. In my models for Accretion i use a \
> QSortFilterProxyModel for the sorting and index which works very well, but that \
> isn't an option in Dolphin due to it's completely different model system.
> Frank Reininghaus wrote:
> I don't think that this has anything to do with the different model/view system. \
> Even if you use a QSortFilterProxyModel to sort a subclass of QAbstractItemModel, \
> then the "source model" must provide index-based access to the items in the model \
> (and keeping the contents (or pointers to the contents) in an ordered container \
> inside the source model is the easiest way to achieve that). Or are you saying that \
> Accretion keeps all data in a QHash and lets QSortFilterProxyModel handle the rest? \
> I would be very surprised if that was possible.
> Mark Gaiser wrote:
> "Or are you saying that Accretion keeps all data in a QHash and lets \
> QSortFilterProxyModel handle the rest? I would be very surprised if that was \
> possible."
> Exactly! well, nearly exactly. I actually store all entries in a QVector.
> Now you probably want to know how that is done.
> I'm going to skip class name reasoning and much of how it all works and jump right \
> in class names and small descriptions.
> When you open a folder it will be stored in a KDirectory class under \
> m_filteredEntries: \
> https://gitorious.org/kdirchainrebuild/master/source/kdirectoryprivate_p.cpp
> My models have a bit of a layered structure:
> DirListModel: https://gitorious.org/kdirchainrebuild/master/source/models/dirlistmodel.cpp \
> The model actually having all entries. The outside world (QML) never sees this \
> model.
> A DirGroupedModel (which inherits a QSortFilterProxyModel) \
> https://gitorious.org/kdirchainrebuild/master/source/models/dirgroupedproxymodel.h \
> is what is known to the outside world. This model knows the before mentioned \
> DirListModel. DirGroupedModel is "just" taking care of grouping a \
> folder by:
> - name
> - extension
> - type
> - ... much more possibilities exactly one must always be chosen. I also have a type \
> called None for no grouping but uses the same logic.
> Once you have chosen a type to group on then DirGroupedModel creates new \
> DirGroupedProxyModel \
> (https://gitorious.org/kdirchainrebuild/master/source/models/dirgroupedproxymodel.cpp) \
> instances for each unique mime (assuming we wanted mime types). The you end up with \
> sub models like so:
> - model with image/jpg items
> - model with inode type (folders)
> - ...
> Important here is that each sub model is put in DirGroupedProxyModel along with a \
> type to filter on (image/jpg, ...). DirGroupedProxyModel then takes care of the \
> filtering. Also important here is that the initial model with all items in it \
> (DirListModel) is passed in each sub model using setSourceModel. Passing the \
> initial model every time is something i still have to look at since it smells like \
> a possible bottleneck.
> This allows me to group by type and within each group sort in any way i want. I \
> first wanted this in Dolphin, but it seemed very difficult to patch dolphin up to \
> support sorting per group. It is far from easy and took me a good week to \
> understand and implement in Accretion. This structure allows me to do everything by \
> index and works very well. I guess QSortFilterProxyModel is doing the mapping from \
> it's new index to the index in the source model and is probably also redirecting \
> data(...) role calls back to the source model.
> It really works wonderful, you should try it sometime :)
> Oh and sorry for - again - being extremely off topic in a reviewrequest. If you \
> want to know more, feel free to send me a mail.
Right, so to summarize our discussion:
Mark: Use a QHash for everything!
Frank: That doesn't work. We need a container that keeps its items in a well-defined \
order and provides index-based access!
Mark: The QSortFilterProxyModel does the trick!
Frank: Really? Does QSortFilterProxyModel sort the items in your QHash?
Mark: Exactly! well, nearly exactly. I actually store all entries in a QVector.
QVector is a container that stores all items in a well-defined order and provides \
index-based access, so I have no further questions now ;-)
About the sorting/grouping stuff: This is really off-topic, but since you brought it \
up: it's certainly a nice idea, but to be honest, I'm not sure if this feature is \
useful enough for a large enough user group to justify adding not only an additional \
layer of indirection (in the form of another proxy model) between the view and the \
actual data, but also a considerable amount of visual and conceptual clutter to a \
default file manager. I could imagine that quite a few users will be confused if \
there is not only a "Sort by -> Name/Date/etc." menu, but also a "Group by -> \
Name/Date/etc." one. But let us please not continue that discussion in this review \
request.
- Frank
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/115432/#review48778
-----------------------------------------------------------
On Feb. 2, 2014, 7:03 p.m., Frank Reininghaus wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/115432/
> -----------------------------------------------------------
>
> (Updated Feb. 2, 2014, 7:03 p.m.)
>
>
> Review request for Dolphin.
>
>
> Bugs: 329494
> http://bugs.kde.org/show_bug.cgi?id=329494
>
>
> Repository: kde-baseapps
>
>
> Description
> -------
>
> KFileItemModel has a member
>
> QHash<KUrl, int> m_items
>
> which makes it possible to find the index of an item if the URL is known. m_items \
> is updated every time an item is added to/removed from the current directory, or if \
> the sort order changes. This is convenient in many situations, but sometimes, this \
> hash also causes problems:
>
> 1. In some corner cases, the model can get into a state where the indexes in \
> m_items are not correct. There were, e.g.,
> https://bugs.kde.org/show_bug.cgi?id=324371
> https://bugs.kde.org/show_bug.cgi?id=325359
>
> which I could reproduce by doing a search in Details View and expanding folders, \
> such that the same URL appeared twice in the view (but only once in m_items).
> But I think that there are other ways to make the relationship between m_items and \
> the model contents inconsistent. A crash that looks like it is most likely caused \
> by such problems is
> https://bugs.kde.org/show_bug.cgi?id=329494
>
>
> 2. Initializing m_items costs CPU cycles and memory, which is wasteful if m_items \
> is not used at all.
>
> Therefore, I propose to make the following changes:
>
> (a) Do not require that all URLs are in m_items any more. Instead, the new \
> requirement is: if there are N URLs in m_items, it is guaranteed that they are the \
> first N URLs in the model.
> (b) Replace most calls of m_items.value(url) by index(url) in KFileItemModel. Note \
> that some local variables 'index' had to be renamed for this to work.
> (c) In KFileItemModel::index(const KUrl&), check if the URL is in m_items already. \
> If that is not the case, add 1000 URLs to the hash, and check again. Repeat until \
> either the URL is found, or all URLs are in the hash (-1 is returned in the latter \
> case). Note that we always know where to continue adding URLs: if m_items.count() \
> is N, then "m_itemData.at(i)->item.url()" is the first URL that is not in m_items.
> BTW, my first version of index(const KUrl&) added URLs to the hash one by one, and \
> checked every time if the next URL is the right one by comparing the URLs with ==. \
> However, this was slow and required a lot more memory because comparing URLs with \
> the == operator triggers a parsing of the URL, which needs memory and CPU cycles. I \
> don't know if this is still the case in Qt 5, but in any case, adding new URLs in \
> larger batches and then checking if the hash contains the searched URL should have \
> a lower amortized cost in any case.
> (d) In insertItems() and removeItems(), clear the hash m_items. It will be \
> re-populated the next time index() is called. Note that Dolphin < 4.11 also cleared \
> the entire hash in these cases (it re-created the entire hash immediately though). \
> In contrast to that, 4.11 and 4.12 only update the URLs/indexes starting from the \
> first added/removed item. This can be problematic though - in bug 329494, we get a \
> crash when trying to remove one of the removed URLs from m_items.
>
> Diffs
> -----
>
> dolphin/src/kitemviews/kfileitemmodel.h 0224290
> dolphin/src/kitemviews/kfileitemmodel.cpp b3c56e1
>
> Diff: https://git.reviewboard.kde.org/r/115432/diff/
>
>
> Testing
> -------
>
> I could not find any regressions. Even if I could never reproduce the crash in bug \
> 329494, I'm confident that the crash is fixed because the line where it tries to \
> access the URL of a removed KFileItem and then crashes is removed (and there is no \
> other access to the KFileItem in that function).
> Memory usage when loading a directory with 100,000 files is reduced by about 5 MiB \
> in all view modes (from about 150 to 145) if previews are disabled. (If previews \
> are enabled, KFileItemModelRolesUpdater calls the index(KUrl&) method, which fills \
> m_items, and memory usage is as before).
> The time for loading a directory with 100,000 files is reduced by about 200 ms on \
> my system (to about 4.3 seconds in Icons View, 3.7 in Compact and 2.8 in Details).
>
> Thanks,
>
> Frank Reininghaus
>
>
[Attachment #3 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://git.reviewboard.kde.org/r/115432/">https://git.reviewboard.kde.org/r/115432/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <p style="margin-top: 0;">On February 2nd, 2014, 8 p.m. UTC, <b>Mark \
Gaiser</b> wrote:</p> <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;"> <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Hi Frank,
You're adding quite a bit of complexity.
Is it an option to just remove m_items and change the storage of m_itemData? That is \
already one big list with all data in it. If you change that to a QHash<KUrl, \
KFileItem> and drop m_items you reduce quite a bit it seems. It will certainly be \
a bit bigger in memory, but you get rid of m_data completely so there might be no \
tradeoff memory wise (right?).
In terms of speed.. I don't know which one will be faster. I'm guessing a \
QList wins there.</pre> </blockquote>
<p>On February 2nd, 2014, 9:57 p.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Hi Mark, thanks for your \
comment. I'm not sure if I fully understand what you mean - a QHash<KUrl, \
KFileItem> is an unordered container, but the files have to appear in the view in \
a well-defined order, and we also need index-based access. We would lose that if we \
only used the hash for storage, and then we woudl have to find some other way to \
establish the "index -> KFileItem" mapping.</pre> </blockquote>
<p>On February 2nd, 2014, 10:14 p.m. UTC, <b>Mark Gaiser</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Hi Frank,
Ahh, i didn't think about that. In my models for Accretion i use a \
QSortFilterProxyModel for the sorting and index which works very well, but that \
isn't an option in Dolphin due to it's completely different model \
system.</pre> </blockquote>
<p>On February 2nd, 2014, 10:20 p.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I don't think that \
this has anything to do with the different model/view system. Even if you use a \
QSortFilterProxyModel to sort a subclass of QAbstractItemModel, then the "source \
model" must provide index-based access to the items in the model (and keeping \
the contents (or pointers to the contents) in an ordered container inside the source \
model is the easiest way to achieve that). Or are you saying that Accretion keeps all \
data in a QHash and lets QSortFilterProxyModel handle the rest? I would be very \
surprised if that was possible.</pre> </blockquote>
<p>On February 2nd, 2014, 11:56 p.m. UTC, <b>Mark Gaiser</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">"Or are you saying \
that Accretion keeps all data in a QHash and lets QSortFilterProxyModel handle the \
rest? I would be very surprised if that was possible."
Exactly! well, nearly exactly. I actually store all entries in a QVector.
Now you probably want to know how that is done.
I'm going to skip class name reasoning and much of how it all works and jump \
right in class names and small descriptions.
When you open a folder it will be stored in a KDirectory class under \
m_filteredEntries: https://gitorious.org/kdirchainrebuild/master/source/kdirectoryprivate_p.cpp
My models have a bit of a layered structure:
DirListModel: https://gitorious.org/kdirchainrebuild/master/source/models/dirlistmodel.cpp \
The model actually having all entries. The outside world (QML) never sees this model.
A DirGroupedModel (which inherits a QSortFilterProxyModel) \
https://gitorious.org/kdirchainrebuild/master/source/models/dirgroupedproxymodel.h is \
what is known to the outside world. This model knows the before mentioned \
DirListModel. DirGroupedModel is "just" taking care of grouping a folder \
by:
- name
- extension
- type
- ... much more possibilities exactly one must always be chosen. I also have a type \
called None for no grouping but uses the same logic.
Once you have chosen a type to group on then DirGroupedModel creates new \
DirGroupedProxyModel \
(https://gitorious.org/kdirchainrebuild/master/source/models/dirgroupedproxymodel.cpp) \
instances for each unique mime (assuming we wanted mime types). The you end up with \
sub models like so:
- model with image/jpg items
- model with inode type (folders)
- ...
Important here is that each sub model is put in DirGroupedProxyModel along with a \
type to filter on (image/jpg, ...). DirGroupedProxyModel then takes care of the \
filtering. Also important here is that the initial model with all items in it \
(DirListModel) is passed in each sub model using setSourceModel. Passing the initial \
model every time is something i still have to look at since it smells like a possible \
bottleneck.
This allows me to group by type and within each group sort in any way i want. I first \
wanted this in Dolphin, but it seemed very difficult to patch dolphin up to support \
sorting per group. It is far from easy and took me a good week to understand and \
implement in Accretion. This structure allows me to do everything by index and works \
very well. I guess QSortFilterProxyModel is doing the mapping from it's new index \
to the index in the source model and is probably also redirecting data(...) role \
calls back to the source model.
It really works wonderful, you should try it sometime :)
Oh and sorry for - again - being extremely off topic in a reviewrequest. If you want \
to know more, feel free to send me a mail.</pre> </blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Right, so to summarize \
our discussion:
Mark: Use a QHash for everything!
Frank: That doesn't work. We need a container that keeps its items in a \
well-defined order and provides index-based access!
Mark: The QSortFilterProxyModel does the trick!
Frank: Really? Does QSortFilterProxyModel sort the items in your QHash?
Mark: Exactly! well, nearly exactly. I actually store all entries in a QVector.
QVector is a container that stores all items in a well-defined order and provides \
index-based access, so I have no further questions now ;-)
About the sorting/grouping stuff: This is really off-topic, but since you brought it \
up: it's certainly a nice idea, but to be honest, I'm not sure if this \
feature is useful enough for a large enough user group to justify adding not only an \
additional layer of indirection (in the form of another proxy model) between the view \
and the actual data, but also a considerable amount of visual and conceptual clutter \
to a default file manager. I could imagine that quite a few users will be confused if \
there is not only a "Sort by -> Name/Date/etc." menu, but also a \
"Group by -> Name/Date/etc." one. But let us please not continue that \
discussion in this review request.</pre> <br />
<p>- Frank</p>
<br />
<p>On February 2nd, 2014, 7:03 p.m. UTC, Frank Reininghaus wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('https://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for Dolphin.</div>
<div>By Frank Reininghaus.</div>
<p style="color: grey;"><i>Updated Feb. 2, 2014, 7:03 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>
<a href="http://bugs.kde.org/show_bug.cgi?id=329494">329494</a>
</div>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
kde-baseapps
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">KFileItemModel has a member
QHash<KUrl, int> m_items
which makes it possible to find the index of an item if the URL is known. m_items is \
updated every time an item is added to/removed from the current directory, or if the \
sort order changes. This is convenient in many situations, but sometimes, this hash \
also causes problems:
1. In some corner cases, the model can get into a state where the indexes in m_items \
are not correct. There were, e.g.,
https://bugs.kde.org/show_bug.cgi?id=324371
https://bugs.kde.org/show_bug.cgi?id=325359
which I could reproduce by doing a search in Details View and expanding folders, such \
that the same URL appeared twice in the view (but only once in m_items).
But I think that there are other ways to make the relationship between m_items and \
the model contents inconsistent. A crash that looks like it is most likely caused by \
such problems is
https://bugs.kde.org/show_bug.cgi?id=329494
2. Initializing m_items costs CPU cycles and memory, which is wasteful if m_items is \
not used at all.
Therefore, I propose to make the following changes:
(a) Do not require that all URLs are in m_items any more. Instead, the new \
requirement is: if there are N URLs in m_items, it is guaranteed that they are the \
first N URLs in the model.
(b) Replace most calls of m_items.value(url) by index(url) in KFileItemModel. Note \
that some local variables 'index' had to be renamed for this to work.
(c) In KFileItemModel::index(const KUrl&), check if the URL is in m_items \
already. If that is not the case, add 1000 URLs to the hash, and check again. Repeat \
until either the URL is found, or all URLs are in the hash (-1 is returned in the \
latter case). Note that we always know where to continue adding URLs: if \
m_items.count() is N, then "m_itemData.at(i)->item.url()" is the first \
URL that is not in m_items.
BTW, my first version of index(const KUrl&) added URLs to the hash one by one, \
and checked every time if the next URL is the right one by comparing the URLs with \
==. However, this was slow and required a lot more memory because comparing URLs with \
the == operator triggers a parsing of the URL, which needs memory and CPU cycles. I \
don't know if this is still the case in Qt 5, but in any case, adding new URLs in \
larger batches and then checking if the hash contains the searched URL should have a \
lower amortized cost in any case.
(d) In insertItems() and removeItems(), clear the hash m_items. It will be \
re-populated the next time index() is called. Note that Dolphin < 4.11 also \
cleared the entire hash in these cases (it re-created the entire hash immediately \
though). In contrast to that, 4.11 and 4.12 only update the URLs/indexes starting \
from the first added/removed item. This can be problematic though - in bug 329494, we \
get a crash when trying to remove one of the removed URLs from m_items.</pre> </td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I could not find any regressions. Even if I could never reproduce the \
crash in bug 329494, I'm confident that the crash is fixed because the line where \
it tries to access the URL of a removed KFileItem and then crashes is removed (and \
there is no other access to the KFileItem in that function).
Memory usage when loading a directory with 100,000 files is reduced by about 5 MiB in \
all view modes (from about 150 to 145) if previews are disabled. (If previews are \
enabled, KFileItemModelRolesUpdater calls the index(KUrl&) method, which fills \
m_items, and memory usage is as before).
The time for loading a directory with 100,000 files is reduced by about 200 ms on my \
system (to about 4.3 seconds in Icons View, 3.7 in Compact and 2.8 in Details).</pre> \
</td> </tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>dolphin/src/kitemviews/kfileitemmodel.h <span style="color: \
grey">(0224290)</span></li>
<li>dolphin/src/kitemviews/kfileitemmodel.cpp <span style="color: \
grey">(b3c56e1)</span></li>
</ul>
<p><a href="https://git.reviewboard.kde.org/r/115432/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic