[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: Review Request 112725: Save memory and time by lazy-loading the item data if possible
From: "Mark Gaiser" <markg85 () gmail ! com>
Date: 2013-09-16 18:27:29
Message-ID: 20130916182729.2757.82975 () vidsolbach ! de
[Download RAW message or body]
> On Sept. 14, 2013, 11 p.m., Mark Gaiser wrote:
> > Hi Frank,
> >
> > I wonder who said that having folders with 100.000 entries is insane? ;) Either \
> > You or David said that to me when i first brought up massive folders and \
> > optimization ideas for it. It's very nice to see that you're now optimizing for \
> > those cases as well! Good job and keep hunting those resources.
> > Now for the more constructive feedback - or rather, an idea where you might find \
> > much more memory being wasted where i haven't found a good solution yet. With \
> > massive folders the UDSEntry object is stored for each object. In your case \
> > (since you're working from the higher level classes) the UDSEntry is stored \
> > within KFileItem. I think you can save a lot of memory in that class by making it \
> > smarter or different. KFileItems also has a bunch of private members to store the \
> > same info from UDSEntry thus i can imagine that quite a lot of data is being \
> > duplicated there.
> > Good luck in your resource hunting :)
>
> Frank Reininghaus wrote:
> I think that both David and I were a bit skeptical about the "500,000 files in one \
> folder" thing ;-) I still think that optimizing for this case might not be the best \
> thing to do if it causes a very significant cost in terms of code complexity. \
> However, there are far more opportunities for small changes like the one I've \
> proposed here than I first thought, which do not require massive code \
> modifications, but still reduce our resource usage in all cases (also if there are \
> far less files in one folder).
> And I admit that opening large folders is indeed the best way to find out about \
> such opportunities, and also to measure how much we gain by a change, so your input \
> in the "massive folder" area was very useful, more useful than I first thought!
> About the UDSEntry issue: Yes, I've also noticed that this is one of the major \
> consumers of memory, but improving this is probably not quite trivial, so I've \
> focused on things where we can gain something with less effort inside Dolphin for \
> the time being. In the long term, a way to have most information about files (i.e., \
> everything that is not needed for sorting) lazy-loaded probably makes sense.
Hi Frank,
I'm just glad you now do realise the importance of benchmarking with massive folders \
:) Ok, granted, i had to take some "who in his sane mind uses such big folders" \
comments but other then that it turned out wonderfully nice if you ask me. Actually, \
my initial benchmarks where with "just" 100.000 files, but i soon made enough \
experimental performance improvements to raise the bar to 500.000. At that level you \
really begin to see massive memory duplication. I'm playing with compression ideas, \
but haven't been got anything working in that area.
Lazy loading is awesome, but if you can compress the data enough that you still have \
all details without lazy loading is even better i think. For example, doing a \
directory listing on any given folder gives at least the following \
UDSEntry values:
- Device ID
- Inode
- User
- Group
That are the same for all files! Ok, you can have folders where you have files from \
multiple users and if you have folders then you also have different inode numbers so \
a simple static cache won't work. But this is an area where a _lot_ can be gained. \
Since we're going off-topic here, posting this to kfm-devel as well.
- Mark
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112725/#review40039
-----------------------------------------------------------
On Sept. 14, 2013, 2:08 p.m., Frank Reininghaus wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112725/
> -----------------------------------------------------------
>
> (Updated Sept. 14, 2013, 2:08 p.m.)
>
>
> Review request for Dolphin.
>
>
> Description
> -------
>
> This patch tries to prevent calling KFileItemModel::retrieveData() when possible in \
> order to save memory and time when loading huge directories.
>
> There are changes in the following members of KFileItemModel:
>
> * data() and setData() are modified such that the QHash "values" for an item is \
> initialized with retrieveData() if that has not happened already.
> * indexForKeyboardSearch() uses the "text" from the KFileItem itself, rather than \
> looking than calling data() (which can be expensive if called for many items \
> because data() might call retrieveData()).
> * isExpandable() calls data(), rather than accessing the QHash "values" directly. \
> If this hash was not initialized it, isExpandable() would always return the default \
> value "false".
> * createItemDataList() does not call retrieveData() for each item any more. \
> However, if the sort role is such that sortRoleCompare() would try to access the \
> QHash "values" when sorting, retrieveData() is called to ensure that sorting works \
> correctly.
>
> Things that could still be improved:
>
> * Currently, the layouting in Compact View calls KFileItemModel::data() for every \
> item. In the case that only the "Name" is shown, this could be prevented (similar \
> to the approach that is used in Icons View, see \
> https://git.reviewboard.kde.org/r/112253/
> * When changing the roles (e.g., by adding or removing columns in Details View) or \
> changing the sort role, retrieveData() is currently called for all items. Moreover, \
> a changedItems signal is issued for all items, such that KFileItemModelRolesUpdater \
> tries to load icons/previews for all items (like in Dolphin < 4.11).
>
> Diffs
> -----
>
> dolphin/src/kitemviews/kfileitemmodel.cpp 51ff142
>
> Diff: http://git.reviewboard.kde.org/r/112725/diff/
>
>
> Testing
> -------
>
> I opened a directory with 100,000 files in Icons View and Details View with a \
> release build, and measured the memory usage with KSysGuard. Moreover, I measured \
> the time required to load the directory using the following patch for DolphinView: \
> http://paste.kde.org/pae0a64c1/
> The time required for the initial loading is not very meaningful because it varies \
> strongly. We can get a better idea of how much time is spent inside Dolphin for \
> loading if all items are cached, so I re-loaded the folder 8 of times by pressing \
> Alt+Up and then Alt+Left.
> In Icons View, the memory consumption is reduced from 198.2 MB to 164.6 MB (-17%), \
> and the average time from 2898 ms to 2814 ms (-2.9%).
> In Details View, the memory consumption is reduced from 225.1 MB to 166.6 MB \
> (-26%), and the average time from 1539 ms to 1076 ms (-30%).
> The savings are larger in Details View because retrieveData() stores more \
> information about the items in the QHash "values" (like the "expanded parents \
> count").
> I could not see any significant differences in Compace View (as expected).
>
>
> Thanks,
>
> Frank Reininghaus
>
>
[Attachment #3 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="http://git.reviewboard.kde.org/r/112725/">http://git.reviewboard.kde.org/r/112725/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <p style="margin-top: 0;">On September 14th, 2013, 11 p.m. UTC, <b>Mark \
Gaiser</b> wrote:</p> <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;"> <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Hi Frank,
I wonder who said that having folders with 100.000 entries is insane? ;) Either You \
or David said that to me when i first brought up massive folders and optimization \
ideas for it. It's very nice to see that you're now optimizing for those \
cases as well! Good job and keep hunting those resources.
Now for the more constructive feedback - or rather, an idea where you might find much \
more memory being wasted where i haven't found a good solution yet. With massive \
folders the UDSEntry object is stored for each object. In your case (since you're \
working from the higher level classes) the UDSEntry is stored within KFileItem. I \
think you can save a lot of memory in that class by making it smarter or different. \
KFileItems also has a bunch of private members to store the same info from UDSEntry \
thus i can imagine that quite a lot of data is being duplicated there.
Good luck in your resource hunting :)</pre>
</blockquote>
<p>On September 16th, 2013, 12:33 p.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I think that both David \
and I were a bit skeptical about the "500,000 files in one folder" thing \
;-) I still think that optimizing for this case might not be the best thing to do if \
it causes a very significant cost in terms of code complexity. However, there are far \
more opportunities for small changes like the one I've proposed here than I first \
thought, which do not require massive code modifications, but still reduce our \
resource usage in all cases (also if there are far less files in one folder).
And I admit that opening large folders is indeed the best way to find out about such \
opportunities, and also to measure how much we gain by a change, so your input in the \
"massive folder" area was very useful, more useful than I first thought!
About the UDSEntry issue: Yes, I've also noticed that this is one of the major \
consumers of memory, but improving this is probably not quite trivial, so I've \
focused on things where we can gain something with less effort inside Dolphin for the \
time being. In the long term, a way to have most information about files (i.e., \
everything that is not needed for sorting) lazy-loaded probably makes sense.</pre> \
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Hi Frank,
I'm just glad you now do realise the importance of benchmarking with massive \
folders :) Ok, granted, i had to take some "who in his sane mind uses such big \
folders" comments but other then that it turned out wonderfully nice if you ask \
me. Actually, my initial benchmarks where with "just" 100.000 files, but i \
soon made enough experimental performance improvements to raise the bar to 500.000. \
At that level you really begin to see massive memory duplication. I'm playing \
with compression ideas, but haven't been got anything working in that area.
Lazy loading is awesome, but if you can compress the data enough that you still have \
all details without lazy loading is even better i think. For example, doing a \
directory listing on any given folder gives at least the following \
UDSEntry values:
- Device ID
- Inode
- User
- Group
That are the same for all files! Ok, you can have folders where you have files from \
multiple users and if you have folders then you also have different inode numbers so \
a simple static cache won't work. But this is an area where a _lot_ can be \
gained. Since we're going off-topic here, posting this to kfm-devel as well. \
</pre> <br />
<p>- Mark</p>
<br />
<p>On September 14th, 2013, 2:08 p.m. UTC, Frank Reininghaus wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for Dolphin.</div>
<div>By Frank Reininghaus.</div>
<p style="color: grey;"><i>Updated Sept. 14, 2013, 2:08 p.m.</i></p>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">This patch tries to prevent calling KFileItemModel::retrieveData() when \
possible in order to save memory and time when loading huge directories.
There are changes in the following members of KFileItemModel:
* data() and setData() are modified such that the QHash "values" for an \
item is initialized with retrieveData() if that has not happened already.
* indexForKeyboardSearch() uses the "text" from the KFileItem itself, \
rather than looking than calling data() (which can be expensive if called for many \
items because data() might call retrieveData()).
* isExpandable() calls data(), rather than accessing the QHash "values" \
directly. If this hash was not initialized it, isExpandable() would always return the \
default value "false".
* createItemDataList() does not call retrieveData() for each item any more. However, \
if the sort role is such that sortRoleCompare() would try to access the QHash \
"values" when sorting, retrieveData() is called to ensure that sorting \
works correctly.
Things that could still be improved:
* Currently, the layouting in Compact View calls KFileItemModel::data() for every \
item. In the case that only the "Name" is shown, this could be prevented \
(similar to the approach that is used in Icons View, see \
https://git.reviewboard.kde.org/r/112253/
* When changing the roles (e.g., by adding or removing columns in Details View) or \
changing the sort role, retrieveData() is currently called for all items. Moreover, a \
changedItems signal is issued for all items, such that KFileItemModelRolesUpdater \
tries to load icons/previews for all items (like in Dolphin < 4.11). </pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I opened a directory with 100,000 files in Icons View and Details View \
with a release build, and measured the memory usage with KSysGuard. Moreover, I \
measured the time required to load the directory using the following patch for \
DolphinView: http://paste.kde.org/pae0a64c1/
The time required for the initial loading is not very meaningful because it varies \
strongly. We can get a better idea of how much time is spent inside Dolphin for \
loading if all items are cached, so I re-loaded the folder 8 of times by pressing \
Alt+Up and then Alt+Left.
In Icons View, the memory consumption is reduced from 198.2 MB to 164.6 MB (-17%), \
and the average time from 2898 ms to 2814 ms (-2.9%).
In Details View, the memory consumption is reduced from 225.1 MB to 166.6 MB (-26%), \
and the average time from 1539 ms to 1076 ms (-30%).
The savings are larger in Details View because retrieveData() stores more information \
about the items in the QHash "values" (like the "expanded parents \
count").
I could not see any significant differences in Compace View (as expected).</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>dolphin/src/kitemviews/kfileitemmodel.cpp <span style="color: \
grey">(51ff142)</span></li>
</ul>
<p><a href="http://git.reviewboard.kde.org/r/112725/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic