[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-core-devel
Subject: Re: Review Request 113355: Reduce UDSEntry memory usage with implicit sharing
From: "Frank Reininghaus" <frank78ac () googlemail ! com>
Date: 2013-11-03 19:00:39
Message-ID: 20131103190039.14091.2952 () vidsolbach ! de
[Download RAW message or body]
> On Oct. 31, 2013, 2:41 p.m., Frank Reininghaus wrote:
> > I see now that I have tried to put too much stuff into a single patch - it's too \
> > hard to digest and to understand, and the number of possibilities to modify \
> > different aspects of UDSEntry in a different way is just too large for me to test \
> > all of them.
> > Still, I consider it a bug that every KDE application that deals directly or \
> > indirectly with UDSEntry wastes ~half a kilobyte of memory for every single file, \
> > and considering that this can be fixed by modifying just a single .cpp file, I \
> > would very much like to see at least a small improvement in the not too distant \
> > future.
> > Maybe the best approach is to discard this review request, and open a new one \
> > that just adds the unit test (it never hurts to have more of them IMHO) and makes \
> > use of the implicit sharing of QStrings. That would at least bring us some of the \
> > memory savings, and it would be a rather straightforward change that would \
> > hopefully still be considered safe enough to include in kdelibs 4.12.
> > @dfaure: I took from our discussion on the mailing list that you agree with the \
> > idea to reduce UDSEntry's memory usage. What do you think?
>
> Mark Gaiser wrote:
> It's not _that_ bad.. The added unittests are big. The actual performance improving \
> patch isn't that big. Out of that the load(...) function is the real complex one. I \
> think you should certainly commit this - as it is - to frameworks. Not so sure for \
> 4.12 though. The 4.12 version "should" only get bug fixes. While this is certainly \
> an improvement, it's not a patch that fixes any bug. On the other hand, memory \
> improvements are always welcome if i recall correctly.
> The actual performance improving patch isn't that big.
But still, it contains two different changes, which are independent of each other, \
and which I should have split up into two patches from the beginning. I've split out \
the least intrusive part into https://git.reviewboard.kde.org/r/113591/
- Frank
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113355/#review42745
-----------------------------------------------------------
On Oct. 26, 2013, 5:56 p.m., Frank Reininghaus wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/113355/
> -----------------------------------------------------------
>
> (Updated Oct. 26, 2013, 5:56 p.m.)
>
>
> Review request for kdelibs.
>
>
> Repository: kdelibs
>
>
> Description
> -------
>
> This patch is based on some discussions on the kfm-devel mailing list, see \
> http://lists.kde.org/?t=137935784800003&r=1&w=2
> Mark found out that KIO's UDSEntry class is one of the major consumers of memory in \
> applications which use KIO to list directories with a large number of files, and I \
> found a way use implicit sharing to drastically reduce the amount of memory it \
> needs. Many thanks to Milian for his great blog post \
> http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1, without which I \
> would probably not have had such ideas.
>
> 1. The problem
>
> The UDSEntry keeps all sorts of information about files which can be stored in a \
> string (name, user, group, etc.) or in a long long (file size, modification time, \
> etc.). All these data can be accessed with a uint key, and UDSEntry returns the \
> corresponding QString or long long in O(1) time. Internally, it stores the data in \
> a QHash<uint, Field>, where Field is a struct that has a QString and a long long \
> member.
> The problem is that QHash needs quite a lot of memory to provide the O(1) access, \
> see http://doc.qt.digia.com/qq/qq19-containers.html for details, and that the \
> minimum capacity of a QHash seems to be 17, even though the number of entries in a \
> UDSEntry is often 8 in the rather typical standard kio_file use case.
>
> 2. Proposed solution
>
> (a) We can store the "Fields" in a QVector<Field>, which needs only very little \
> overhead compared to the memory that the actual "Fields" need. When loading a \
> UDSEntry from a QDataStream, we just append all "Fields" to this QVector one by \
> one. Moreover, we need a QHash<uint, int>, which maps each key to the index of its \
> Field in the QVector. This restructuring alone does not reduce the memory usage, of \
> course.
> (b) The key observation is that the QDataStream, which KIO::ListJob reads the \
> UDSEntries from, typically provides the different "Fields" in exactly the same \
> order. This means that the QHash<uint, int> is usually exactly the same for every \
> UDSEntry, and we can make use of implicit sharing to store only one copy of this \
> QHash. I've modified
> void UDSEntryPrivate::load(QDataStream &s, UDSEntry &a)
>
> such that it remembers the most recent QHash<uint, int> and just adds an implicitly \
> shared copy of it to "a" if the order of the Fields has not changed.
> (c) Moreover, some of the QString Fields in the UDSEntries in one directory are \
> often the same, like, e.g., the user and the group. The "load" function also \
> remembers the most recently read values for each Field in a static QVector<QString> \
> and just puts an implicitly shared copy into the UDSEntry if possible.
>
> 3. Possible disadvantages
>
> (a) When using the "remove" member, the new version of UDSEntry does not remove the \
> Field from the QVector<Field>. This means that removing and adding a "Field" \
> repeatedly would let the memory usage grow indefinitely. According to David \
> (http://lists.kde.org/?l=kfm-devel&m=138052519927973&w=2), this doesn't matter \
> though because no known user of UDSEntry uses its remove() member. Maybe we should \
> remove remove (pun stolen grom David) in the frameworks branch then?
> (b) In principle, the old version of UDSEntryPrivate::load(QDataStream&, UDSEntry&) \
> was reentrant. This is not the case for my changed version. Reentrancy could be \
> restored rather easily by protecting the access to the static data with a mutex, \
> but given that most of KIO is not supposed to be used from outside the main thread \
> AFAIK, I don't know if this is necessary.
>
> 4. Changes since the first version of the patch which I posted in \
> http://lists.kde.org/?l=kfm-devel&m=137962995805432&w=2
> (a) Implemented the minor changes suggested by David in \
> http://lists.kde.org/?l=kfm-devel&m=137975442807965&w=2
> (b) Added a unit test to verify that storing and loading UDSEntries from a stream \
> works even if the order of the fields is permuted, and some fields are removed or \
> added in between.
> (c) Fixed a bug which was uncovered by the test: \
> cachedUdsFields.erase(cachedUdsFields.begin() + i, cachedUdsFields.end()) instead \
> of cachedUdsFields.erase(cachedUdsFields.begin() + i)
> (d) Use QVector::reserve to reserve the appropriate size for the QVector<Field>. \
> Saves some time when loading the UDSEntry, and reduces the memory usage further.
> (e) Changed the type of the loop variable from quint32 to int to fix some compiler \
> warnings.
>
> Diffs
> -----
>
> kio/kio/udsentry.h e1c8b05
> kio/kio/udsentry.cpp 1e1f503
> kio/tests/CMakeLists.txt 1019312
> kio/tests/simplelistjobtest.cpp PRE-CREATION
> kio/tests/udsentrybenchmark.cpp PRE-CREATION
> kio/tests/udsentrytest.h PRE-CREATION
> kio/tests/udsentrytest.cpp PRE-CREATION
>
> Diff: http://git.reviewboard.kde.org/r/113355/diff/
>
>
> Testing
> -------
>
> Old and new unit tests pass. The memory usage of Dolphin when loading a directory \
> with 100,000 files in Icons View is reduced from 165.4 MB to 113.3 MB. Any \
> application that uses a file dialog, a KDirLister, or anything else that uses a \
> KIO::ListJob to list directory contents should benefit from similar savings.
>
> Thanks,
>
> Frank Reininghaus
>
>
[Attachment #3 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="http://git.reviewboard.kde.org/r/113355/">http://git.reviewboard.kde.org/r/113355/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <p style="margin-top: 0;">On October 31st, 2013, 2:41 p.m. UTC, <b>Frank \
Reininghaus</b> wrote:</p> <blockquote style="margin-left: 1em; border-left: 2px \
solid #d0d0d0; padding-left: 10px;"> <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I see now that I have tried to put too much stuff into a single patch - \
it's too hard to digest and to understand, and the number of possibilities to \
modify different aspects of UDSEntry in a different way is just too large for me to \
test all of them.
Still, I consider it a bug that every KDE application that deals directly or \
indirectly with UDSEntry wastes ~half a kilobyte of memory for every single file, and \
considering that this can be fixed by modifying just a single .cpp file, I would very \
much like to see at least a small improvement in the not too distant future.
Maybe the best approach is to discard this review request, and open a new one that \
just adds the unit test (it never hurts to have more of them IMHO) and makes use of \
the implicit sharing of QStrings. That would at least bring us some of the memory \
savings, and it would be a rather straightforward change that would hopefully still \
be considered safe enough to include in kdelibs 4.12.
@dfaure: I took from our discussion on the mailing list that you agree with the idea \
to reduce UDSEntry's memory usage. What do you think?</pre> </blockquote>
<p>On November 2nd, 2013, 4:47 p.m. UTC, <b>Mark Gaiser</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">It's not _that_ \
bad.. The added unittests are big. The actual performance improving patch isn't \
that big. Out of that the load(...) function is the real complex one. I think you \
should certainly commit this - as it is - to frameworks. Not so sure for 4.12 though. \
The 4.12 version "should" only get bug fixes. While this is certainly an \
improvement, it's not a patch that fixes any bug. On the other hand, memory \
improvements are always welcome if i recall correctly.</pre> </blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">> The actual \
performance improving patch isn't that big.
But still, it contains two different changes, which are independent of each other, \
and which I should have split up into two patches from the beginning. I've split \
out the least intrusive part into https://git.reviewboard.kde.org/r/113591/</pre> <br \
/>
<p>- Frank</p>
<br />
<p>On October 26th, 2013, 5:56 p.m. UTC, Frank Reininghaus wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for kdelibs.</div>
<div>By Frank Reininghaus.</div>
<p style="color: grey;"><i>Updated Oct. 26, 2013, 5:56 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
kdelibs
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">This patch is based on some discussions on the kfm-devel mailing list, \
see http://lists.kde.org/?t=137935784800003&r=1&w=2
Mark found out that KIO's UDSEntry class is one of the major consumers of memory \
in applications which use KIO to list directories with a large number of files, and I \
found a way use implicit sharing to drastically reduce the amount of memory it needs. \
Many thanks to Milian for his great blog post \
http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1, without which I would \
probably not have had such ideas.
1. The problem
The UDSEntry keeps all sorts of information about files which can be stored in a \
string (name, user, group, etc.) or in a long long (file size, modification time, \
etc.). All these data can be accessed with a uint key, and UDSEntry returns the \
corresponding QString or long long in O(1) time. Internally, it stores the data in a \
QHash<uint, Field>, where Field is a struct that has a QString and a long long \
member.
The problem is that QHash needs quite a lot of memory to provide the O(1) access, see \
http://doc.qt.digia.com/qq/qq19-containers.html for details, and that the minimum \
capacity of a QHash seems to be 17, even though the number of entries in a UDSEntry \
is often 8 in the rather typical standard kio_file use case.
2. Proposed solution
(a) We can store the "Fields" in a QVector<Field>, which needs only \
very little overhead compared to the memory that the actual "Fields" need. \
When loading a UDSEntry from a QDataStream, we just append all "Fields" to \
this QVector one by one. Moreover, we need a QHash<uint, int>, which maps each \
key to the index of its Field in the QVector. This restructuring alone does not \
reduce the memory usage, of course.
(b) The key observation is that the QDataStream, which KIO::ListJob reads the \
UDSEntries from, typically provides the different "Fields" in exactly the \
same order. This means that the QHash<uint, int> is usually exactly the same \
for every UDSEntry, and we can make use of implicit sharing to store only one copy of \
this QHash. I've modified
void UDSEntryPrivate::load(QDataStream &s, UDSEntry &a)
such that it remembers the most recent QHash<uint, int> and just adds an \
implicitly shared copy of it to "a" if the order of the Fields has not \
changed.
(c) Moreover, some of the QString Fields in the UDSEntries in one directory are often \
the same, like, e.g., the user and the group. The "load" function also \
remembers the most recently read values for each Field in a static \
QVector<QString> and just puts an implicitly shared copy into the UDSEntry if \
possible.
3. Possible disadvantages
(a) When using the "remove" member, the new version of UDSEntry does not \
remove the Field from the QVector<Field>. This means that removing and adding a \
"Field" repeatedly would let the memory usage grow indefinitely. According \
to David (http://lists.kde.org/?l=kfm-devel&m=138052519927973&w=2), this \
doesn't matter though because no known user of UDSEntry uses its remove() member. \
Maybe we should remove remove (pun stolen grom David) in the frameworks branch then?
(b) In principle, the old version of UDSEntryPrivate::load(QDataStream&, \
UDSEntry&) was reentrant. This is not the case for my changed version. Reentrancy \
could be restored rather easily by protecting the access to the static data with a \
mutex, but given that most of KIO is not supposed to be used from outside the main \
thread AFAIK, I don't know if this is necessary.
4. Changes since the first version of the patch which I posted in \
http://lists.kde.org/?l=kfm-devel&m=137962995805432&w=2
(a) Implemented the minor changes suggested by David in \
http://lists.kde.org/?l=kfm-devel&m=137975442807965&w=2
(b) Added a unit test to verify that storing and loading UDSEntries from a stream \
works even if the order of the fields is permuted, and some fields are removed or \
added in between.
(c) Fixed a bug which was uncovered by the test: \
cachedUdsFields.erase(cachedUdsFields.begin() + i, cachedUdsFields.end()) instead of \
cachedUdsFields.erase(cachedUdsFields.begin() + i)
(d) Use QVector::reserve to reserve the appropriate size for the \
QVector<Field>. Saves some time when loading the UDSEntry, and reduces the \
memory usage further.
(e) Changed the type of the loop variable from quint32 to int to fix some compiler \
warnings. </pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Old and new unit tests pass. The memory usage of Dolphin when loading a \
directory with 100,000 files in Icons View is reduced from 165.4 MB to 113.3 MB. Any \
application that uses a file dialog, a KDirLister, or anything else that uses a \
KIO::ListJob to list directory contents should benefit from similar savings.</pre> \
</td> </tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>kio/kio/udsentry.h <span style="color: grey">(e1c8b05)</span></li>
<li>kio/kio/udsentry.cpp <span style="color: grey">(1e1f503)</span></li>
<li>kio/tests/CMakeLists.txt <span style="color: grey">(1019312)</span></li>
<li>kio/tests/simplelistjobtest.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>
<li>kio/tests/udsentrybenchmark.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>
<li>kio/tests/udsentrytest.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>kio/tests/udsentrytest.cpp <span style="color: grey">(PRE-CREATION)</span></li>
</ul>
<p><a href="http://git.reviewboard.kde.org/r/113355/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic