[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: Review Request 112725: Save memory and time by lazy-loading the item data if possible
From:       "Mark Gaiser" <markg85 () gmail ! com>
Date:       2013-09-16 18:27:29
Message-ID: 20130916182729.2757.82975 () vidsolbach ! de
[Download RAW message or body]

> On Sept. 14, 2013, 11 p.m., Mark Gaiser wrote:
> > Hi Frank,
> > 
> > I wonder who said that having folders with 100.000 entries is insane? ;) Either \
> > You or David said that to me when i first brought up massive folders and \
> > optimization ideas for it. It's very nice to see that you're now optimizing for \
> > those cases as well! Good job and keep hunting those resources. 
> > Now for the more constructive feedback - or rather, an idea where you might find \
> > much more memory being wasted where i haven't found a good solution yet. With \
> > massive folders the UDSEntry object is stored for each object. In your case \
> > (since you're working from the higher level classes) the UDSEntry is stored \
> > within KFileItem. I think you can save a lot of memory in that class by making it \
> > smarter or different. KFileItems also has a bunch of private members to store the \
> > same info from UDSEntry thus i can imagine that quite a lot of data is being \
> > duplicated there. 
> > Good luck in your resource hunting :)
> 
> Frank Reininghaus wrote:
> I think that both David and I were a bit skeptical about the "500,000 files in one \
> folder" thing ;-) I still think that optimizing for this case might not be the best \
> thing to do if it causes a very significant cost in terms of code complexity. \
> However, there are far more opportunities for small changes like the one I've \
> proposed here than I first thought, which do not require massive code \
> modifications, but still reduce our resource usage in all cases (also if there are \
> far less files in one folder). 
> And I admit that opening large folders is indeed the best way to find out about \
> such opportunities, and also to measure how much we gain by a change, so your input \
> in the "massive folder" area was very useful, more useful than I first thought! 
> About the UDSEntry issue: Yes, I've also noticed that this is one of the major \
> consumers of memory, but improving this is probably not quite trivial, so I've \
> focused on things where we can gain something with less effort inside Dolphin for \
> the time being. In the long term, a way to have most information about files (i.e., \
> everything that is not needed for sorting) lazy-loaded probably makes sense.

Hi Frank,

I'm just glad you now do realise the importance of benchmarking with massive folders \
:) Ok, granted, i had to take some "who in his sane mind uses such big folders" \
comments but other then that it turned out wonderfully nice if you ask me. Actually, \
my initial benchmarks where with "just" 100.000 files, but i soon made enough \
experimental performance improvements to raise the bar to 500.000. At that level you \
really begin to see massive memory duplication. I'm playing with compression ideas, \
but haven't been got anything working in that area.

Lazy loading is awesome, but if you can compress the data enough that you still have \
all details without lazy loading is even better i think. For example, doing a \
                directory listing on any given folder gives at least the following \
                UDSEntry values:
- Device ID
- Inode
- User
- Group

That are the same for all files! Ok, you can have folders where you have files from \
multiple users and if you have folders then you also have different inode numbers so \
a simple static cache won't work. But this is an area where a _lot_ can be gained. \
Since we're going off-topic here, posting this to kfm-devel as well.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112725/#review40039
-----------------------------------------------------------


On Sept. 14, 2013, 2:08 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112725/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2013, 2:08 p.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Description
> -------
> 
> This patch tries to prevent calling KFileItemModel::retrieveData() when possible in \
> order to save memory and time when loading huge directories. 
> 
> There are changes in the following members of KFileItemModel:
> 
> * data() and setData() are modified such that the QHash "values" for an item is \
> initialized with retrieveData() if that has not happened already. 
> * indexForKeyboardSearch() uses the "text" from the KFileItem itself, rather than \
> looking than calling data() (which can be expensive if called for many items \
> because data() might call retrieveData()). 
> * isExpandable() calls data(), rather than accessing the QHash "values" directly. \
> If this hash was not initialized it, isExpandable() would always return the default \
> value "false". 
> * createItemDataList() does not call retrieveData() for each item any more. \
> However, if the sort role is such that sortRoleCompare() would try to access the \
> QHash "values" when sorting, retrieveData() is called to ensure that sorting works \
> correctly. 
> 
> Things that could still be improved:
> 
> * Currently, the layouting in Compact View calls KFileItemModel::data() for every \
> item. In the case that only the "Name" is shown, this could be prevented (similar \
> to the approach that is used in Icons View, see \
> https://git.reviewboard.kde.org/r/112253/ 
> * When changing the roles (e.g., by adding or removing columns in Details View) or \
> changing the sort role, retrieveData() is currently called for all items. Moreover, \
> a changedItems signal is issued for all items, such that KFileItemModelRolesUpdater \
> tries to load icons/previews for all items (like in Dolphin < 4.11). 
> 
> Diffs
> -----
> 
> dolphin/src/kitemviews/kfileitemmodel.cpp 51ff142 
> 
> Diff: http://git.reviewboard.kde.org/r/112725/diff/
> 
> 
> Testing
> -------
> 
> I opened a directory with 100,000 files in Icons View and Details View with a \
> release build, and measured the memory usage with KSysGuard. Moreover, I measured \
> the time required to load the directory using the following patch for DolphinView: \
> http://paste.kde.org/pae0a64c1/ 
> The time required for the initial loading is not very meaningful because it varies \
> strongly. We can get a better idea of how much time is spent inside Dolphin for \
> loading if all items are cached, so I re-loaded the folder 8 of times by pressing \
> Alt+Up and then Alt+Left. 
> In Icons View, the memory consumption is reduced from 198.2 MB to 164.6 MB (-17%), \
> and the average time from 2898 ms to 2814 ms (-2.9%). 
> In Details View, the memory consumption is reduced from 225.1 MB to 166.6 MB \
> (-26%), and the average time from 1539 ms to 1076 ms (-30%). 
> The savings are larger in Details View because retrieveData() stores more \
> information about the items in the QHash "values" (like the "expanded parents \
> count"). 
> I could not see any significant differences in Compace View (as expected).
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
> 


[Attachment #3 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="http://git.reviewboard.kde.org/r/112725/">http://git.reviewboard.kde.org/r/112725/</a>
  </td>
    </tr>
   </table>
   <br />





<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On September 14th, 2013, 11 p.m. UTC, <b>Mark \
Gaiser</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Hi Frank,

I wonder who said that having folders with 100.000 entries is insane? ;) Either You \
or David said that to me when i first brought up massive folders and optimization \
ideas for it. It&#39;s very nice to see that you&#39;re now optimizing for those \
cases as well! Good job and keep hunting those resources.

Now for the more constructive feedback - or rather, an idea where you might find much \
more memory being wasted where i haven&#39;t found a good solution yet. With massive \
folders the UDSEntry object is stored for each object. In your case (since you&#39;re \
working from the higher level classes) the UDSEntry is stored within KFileItem. I \
think you can save a lot of memory in that class by making it smarter or different. \
KFileItems also has a bunch of private members to store the same info from UDSEntry \
thus i can imagine that quite a lot of data is being duplicated there.

Good luck in your resource hunting :)</pre>
 </blockquote>




 <p>On September 16th, 2013, 12:33 p.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I think that both David \
and I were a bit skeptical about the &quot;500,000 files in one folder&quot; thing \
;-) I still think that optimizing for this case might not be the best thing to do if \
it causes a very significant cost in terms of code complexity. However, there are far \
more opportunities for small changes like the one I&#39;ve proposed here than I first \
thought, which do not require massive code modifications, but still reduce our \
resource usage in all cases (also if there are far less files in one folder).

And I admit that opening large folders is indeed the best way to find out about such \
opportunities, and also to measure how much we gain by a change, so your input in the \
&quot;massive folder&quot; area was very useful, more useful than I first thought!

About the UDSEntry issue: Yes, I&#39;ve also noticed that this is one of the major \
consumers of memory, but improving this is probably not quite trivial, so I&#39;ve \
focused on things where we can gain something with less effort inside Dolphin for the \
time being. In the long term, a way to have most information about files (i.e., \
everything that is not needed for sorting) lazy-loaded probably makes sense.</pre>  \
</blockquote>








</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Hi Frank,

I&#39;m just glad you now do realise the importance of benchmarking with massive \
folders :) Ok, granted, i had to take some &quot;who in his sane mind uses such big \
folders&quot; comments but other then that it turned out wonderfully nice if you ask \
me. Actually, my initial benchmarks where with &quot;just&quot; 100.000 files, but i \
soon made enough experimental performance improvements to raise the bar to 500.000. \
At that level you really begin to see massive memory duplication. I&#39;m playing \
with compression ideas, but haven&#39;t been got anything working in that area.

Lazy loading is awesome, but if you can compress the data enough that you still have \
all details without lazy loading is even better i think. For example, doing a \
                directory listing on any given folder gives at least the following \
                UDSEntry values:
- Device ID
- Inode
- User
- Group

That are the same for all files! Ok, you can have folders where you have files from \
multiple users and if you have folders then you also have different inode numbers so \
a simple static cache won&#39;t work. But this is an area where a _lot_ can be \
gained. Since we&#39;re going off-topic here, posting this to kfm-devel as well. \
</pre> <br />










<p>- Mark</p>


<br />
<p>On September 14th, 2013, 2:08 p.m. UTC, Frank Reininghaus wrote:</p>








<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for Dolphin.</div>
<div>By Frank Reininghaus.</div>


<p style="color: grey;"><i>Updated Sept. 14, 2013, 2:08 p.m.</i></p>






<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">This patch tries to prevent calling KFileItemModel::retrieveData() when \
possible in order to save memory and time when loading huge directories.


There are changes in the following members of KFileItemModel:

* data() and setData() are modified such that the QHash &quot;values&quot; for an \
item is initialized with retrieveData() if that has not happened already.

* indexForKeyboardSearch() uses the &quot;text&quot; from the KFileItem itself, \
rather than looking than calling data() (which can be expensive if called for many \
items because data() might call retrieveData()).

* isExpandable() calls data(), rather than accessing the QHash &quot;values&quot; \
directly. If this hash was not initialized it, isExpandable() would always return the \
default value &quot;false&quot;.

* createItemDataList() does not call retrieveData() for each item any more. However, \
if the sort role is such that sortRoleCompare() would try to access the QHash \
&quot;values&quot; when sorting, retrieveData() is called to ensure that sorting \
works correctly.


Things that could still be improved:

* Currently, the layouting in Compact View calls KFileItemModel::data() for every \
item. In the case that only the &quot;Name&quot; is shown, this could be prevented \
(similar to the approach that is used in Icons View, see \
https://git.reviewboard.kde.org/r/112253/

* When changing the roles (e.g., by adding or removing columns in Details View) or \
changing the sort role, retrieveData() is currently called for all items. Moreover, a \
changedItems signal is issued for all items, such that KFileItemModelRolesUpdater \
tries to load icons/previews for all items (like in Dolphin &lt; 4.11). </pre>
  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I opened a directory with 100,000 files in Icons View and Details View \
with a release build, and measured the memory usage with KSysGuard. Moreover, I \
measured the time required to load the directory using the following patch for \
DolphinView: http://paste.kde.org/pae0a64c1/

The time required for the initial loading is not very meaningful because it varies \
strongly. We can get a better idea of how much time is spent inside Dolphin for \
loading if all items are cached, so I re-loaded the folder 8 of times by pressing \
Alt+Up and then Alt+Left.

In Icons View, the memory consumption is reduced from 198.2 MB to 164.6 MB (-17%), \
and the average time from 2898 ms to 2814 ms (-2.9%).

In Details View, the memory consumption is reduced from 225.1 MB to 166.6 MB (-26%), \
and the average time from 1539 ms to 1076 ms (-30%).

The savings are larger in Details View because retrieveData() stores more information \
about the items in the QHash &quot;values&quot; (like the &quot;expanded parents \
count&quot;).

I could not see any significant differences in Compace View (as expected).</pre>
  </td>
 </tr>
</table>




<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>dolphin/src/kitemviews/kfileitemmodel.cpp <span style="color: \
grey">(51ff142)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/112725/diff/" style="margin-left: \
3em;">View Diff</a></p>







  </td>
 </tr>
</table>








  </div>
 </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic