[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: Review Request 112979: Make use of multi-threading in KItemListSizeHintResolver
From: "Frank Reininghaus" <frank78ac () googlemail ! com>
Date: 2013-12-17 17:40:06
Message-ID: 20131217174006.11877.34731 () vidsolbach ! de
[Download RAW message or body]
> On Sept. 30, 2013, 10:04 p.m., Frank Reininghaus wrote:
> > Thanks for looking into this! The layouting procedure is indeed one of the major bottlenecks when \
> > entering a directory, and I was also thinking that making use of multiple CPU cores is the easiest \
> > way for short-term improvements without major modifications in the layouting code. Your approach \
> > looks good to me - I think I would have also tried something like that.
> > I'm a bit surprised/disappointed though that it saves only 1/3 of the time. In can confirm that \
> > observation on my system (with a rather old dual-core AMD 64X2 5000+ CPU, which has to my knowledge \
> > no "Turbo boost" feature if only one core is busy, which might otherwise have been an explanation for \
> > this). I can currently see two possible explanations for this:
> > (a) Could it be that QtConcurrent::blockingMap() adds a lot of overhead to each function call? Maybe \
> > it's designed for the case that each invocation of the function is expensive (in which case the \
> > overhead wouldn't matter much). However, in our case, a single call of "itemSizeHint" is not \
> > extremely expensive - it's the large number of calls that is the problem.
> > But I don't know if that assumption is correct. Maybe QtConcurrent::blockingMap() is clever enough to \
> > split the input list into only a few large chunks.
> >
> > (b) Maybe some parts of the font/text code in Qt, which is called by \
> > KStandardItemListWidgetInformant::itemSizeHint(), are protected by a mutex internally. Then there \
> > could be several causes for performance loss. If a large part of the time is spent under mutex \
> > protection, then the gain from parallelization is severely limited by Amdahl's Law. Moreover, if the \
> > mutex is locked/released frequently by several competing threads, then the performance will suffer \
> > greatly because of mutex contention.
> >
> > An interesting experiment would be to test your patch on a system with more than 2 physical cores and \
> > check how much it improves the performance. If anyone with such a system is willing to try this:
> > * Uncomment the line "// #define KITEMLISTVIEWLAYOUTER_DEBUG" in \
> > dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp
> > * Either switch on all debug output with kdebugdialog or change the kDebug() in this file to qDebug()
> > * Open a folder with many files (100,000 or more) in Icons View (possibly reload it a couple of times \
> > with F5).
> > * Apply Emmanuel's patch and repeat the procedure.
> >
> > If mutex contention is indeed an issue here, then I would expect that 4 or more cores will not bring \
> > a much larger performance improvement (compared to the 1/3 that we get with 2 cores), or that it \
> > would even get worse.
> >
> > Another idea that I had recently (not tested yet, and orthogonal to your patch in the sense that it \
> > might improve the performance even in the single-thread case, and that it could be combined with a \
> > QtConcurrent approach to bring further improvements):
> > Much of the code inside KStandardItemListWidgetInformant::itemSizeHint() does exactly the same every \
> > time. In Icons View, this is everything except for the calculation of the number of lines for the \
> > item text. To improve this, one could replace the function
> > QSizeF KStandardItemListWidgetInformant::itemSizeHint(int index, const KItemListView* view) const
> >
> > by
> >
> > void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF>& sizeHints, const KItemListView* \
> > view) const
> > or something like that, do most of the things only once, and then calculate only the number of lines \
> > in a loop.
> > A rough draft:
> >
> > void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF> sizeHints, const KItemListView* \
> > view) const {
> > //...
> >
> > switch (static_cast<const KStandardItemListView*>(view)->itemLayout()) {
> > case KStandardItemListWidget::IconsLayout: {
> >
> > // Stuff that applies to all items, like...
> > const qreal itemWidth = view->itemSize().width();
> > const qreal maxWidth = itemWidth - 2 * option.padding;
> > const qreal additionalRolesHeight = additionalRolesCount * option.fontMetrics.lineSpacing();
> >
> > for (int i = 0; i < sizeHints.count(); ++i) {
> > if (!sizeHints.at(i).isEmpty()) {
> > continue;
> > }
> >
> > const QString text = KStringHandler::preProcessWrap(itemText(index, view));
> >
> > // Calculate the number of lines and the "textHeight"
> > //...
> >
> > const qreal totalHeight = qMin(textHeight + additionalRolesHeight, maxTextHeight);
> > sizeHints[i] = QSizeF(itemWitdth, totalHeight);
> > }
> >
> > // ...
> > }
> > }
> >
> >
> >
> > Maybe it's worth trying how much time that saves, and how it can be combined with your QtConcurrent \
> > idea?
>
> Emmanuel Pescosta wrote:
> > Maybe it's worth trying how much time that saves, and how it can be combined with your QtConcurrent \
> > idea?
>
> Sounds great :)
> I'll test it!
I did have a chance to test your patch on a computer with 4 CPU cores, and restricted the number of \
threads by calling
QThreadPool::globalInstance()->setMaxThreadCount(number);
before the QtConcurrent call, with values for 'number' between 1 and 4. When using more than 2 threads, \
the time needed for the layout is not reduced further, but increases slightly. Moreover, the total CPU \
time required as reported by 'time dolphin /path/to/large/folder' increases significantly when using more \
than 1 thread. So it appears that contention is really a problem here, and the 'size hint' calculation \
using Qt's text engine stuff is not suitable for parallel computation.
Have you tried the "calculate all size hints in all function" approach yet?
- Frank
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112979/#review41057
-----------------------------------------------------------
On Sept. 28, 2013, 7:07 p.m., Emmanuel Pescosta wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112979/
> -----------------------------------------------------------
>
> (Updated Sept. 28, 2013, 7:07 p.m.)
>
>
> Review request for Dolphin.
>
>
> Repository: kde-baseapps
>
>
> Description
> -------
>
> Make use of multi-threading in KItemListSizeHintResolver.
>
> We need the item size hints of all items in KItemListLayouter::doLayout(),
> so instead of resolving it sequentially during layout calculation,
> we can resolve all item size hints in parallel before we start the
> layout calculation.
>
>
> Diffs
> -----
>
> dolphin/src/kitemviews/private/kitemlistsizehintresolver.h 486f9b6
> dolphin/src/kitemviews/private/kitemlistsizehintresolver.cpp 0e2286b
> dolphin/src/kitemviews/private/kitemlistviewlayouter.h 306fcd3
> dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp d6e78ae
>
> Diff: http://git.reviewboard.kde.org/r/112979/diff/
>
>
> Testing
> -------
>
> Everything works fine.
>
> It is about 1/3 faster on my machine (tested with 100k items in item view mode).
>
>
> Thanks,
>
> Emmanuel Pescosta
>
>
[Attachment #3 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="http://git.reviewboard.kde.org/r/112979/">http://git.reviewboard.kde.org/r/112979/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<p style="margin-top: 0;">On September 30th, 2013, 10:04 p.m. UTC, <b>Frank Reininghaus</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: \
-o-pre-wrap; word-wrap: break-word;">Thanks for looking into this! The layouting procedure is indeed one \
of the major bottlenecks when entering a directory, and I was also thinking that making use of multiple \
CPU cores is the easiest way for short-term improvements without major modifications in the layouting \
code. Your approach looks good to me - I think I would have also tried something like that.
I'm a bit surprised/disappointed though that it saves only 1/3 of the time. In can confirm that \
observation on my system (with a rather old dual-core AMD 64X2 5000+ CPU, which has to my knowledge no \
"Turbo boost" feature if only one core is busy, which might otherwise have been an explanation \
for this). I can currently see two possible explanations for this:
(a) Could it be that QtConcurrent::blockingMap() adds a lot of overhead to each function call? Maybe \
it's designed for the case that each invocation of the function is expensive (in which case the \
overhead wouldn't matter much). However, in our case, a single call of "itemSizeHint" is \
not extremely expensive - it's the large number of calls that is the problem.
But I don't know if that assumption is correct. Maybe QtConcurrent::blockingMap() is clever enough to \
split the input list into only a few large chunks.
(b) Maybe some parts of the font/text code in Qt, which is called by \
KStandardItemListWidgetInformant::itemSizeHint(), are protected by a mutex internally. Then there could \
be several causes for performance loss. If a large part of the time is spent under mutex protection, then \
the gain from parallelization is severely limited by Amdahl's Law. Moreover, if the mutex is \
locked/released frequently by several competing threads, then the performance will suffer greatly because \
of mutex contention.
An interesting experiment would be to test your patch on a system with more than 2 physical cores and \
check how much it improves the performance. If anyone with such a system is willing to try this:
* Uncomment the line "// #define KITEMLISTVIEWLAYOUTER_DEBUG" in \
dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp
* Either switch on all debug output with kdebugdialog or change the kDebug() in this file to qDebug()
* Open a folder with many files (100,000 or more) in Icons View (possibly reload it a couple of times \
with F5).
* Apply Emmanuel's patch and repeat the procedure.
If mutex contention is indeed an issue here, then I would expect that 4 or more cores will not bring a \
much larger performance improvement (compared to the 1/3 that we get with 2 cores), or that it would even \
get worse.
Another idea that I had recently (not tested yet, and orthogonal to your patch in the sense that it might \
improve the performance even in the single-thread case, and that it could be combined with a QtConcurrent \
approach to bring further improvements):
Much of the code inside KStandardItemListWidgetInformant::itemSizeHint() does exactly the same every \
time. In Icons View, this is everything except for the calculation of the number of lines for the item \
text. To improve this, one could replace the function
QSizeF KStandardItemListWidgetInformant::itemSizeHint(int index, const KItemListView* view) const
by
void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF>& sizeHints, const \
KItemListView* view) const
or something like that, do most of the things only once, and then calculate only the number of lines in a \
loop.
A rough draft:
void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF> sizeHints, const KItemListView* \
view) const {
//...
switch (static_cast<const KStandardItemListView*>(view)->itemLayout()) {
case KStandardItemListWidget::IconsLayout: {
// Stuff that applies to all items, like...
const qreal itemWidth = view->itemSize().width();
const qreal maxWidth = itemWidth - 2 * option.padding;
const qreal additionalRolesHeight = additionalRolesCount * option.fontMetrics.lineSpacing();
for (int i = 0; i < sizeHints.count(); ++i) {
if (!sizeHints.at(i).isEmpty()) {
continue;
}
const QString text = KStringHandler::preProcessWrap(itemText(index, view));
// Calculate the number of lines and the "textHeight"
//...
const qreal totalHeight = qMin(textHeight + additionalRolesHeight, maxTextHeight);
sizeHints[i] = QSizeF(itemWitdth, totalHeight);
}
// ...
}
}
Maybe it's worth trying how much time that saves, and how it can be combined with your QtConcurrent \
idea? </pre>
</blockquote>
<p>On October 4th, 2013, noon UTC, <b>Emmanuel Pescosta</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: \
-o-pre-wrap; word-wrap: break-word;">> Maybe it's worth trying how much time that saves, and how \
it can be combined with your QtConcurrent idea?
Sounds great :)
I'll test it!</pre>
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: \
-o-pre-wrap; word-wrap: break-word;">I did have a chance to test your patch on a computer with 4 CPU \
cores, and restricted the number of threads by calling
QThreadPool::globalInstance()->setMaxThreadCount(number);
before the QtConcurrent call, with values for 'number' between 1 and 4. When using more than 2 \
threads, the time needed for the layout is not reduced further, but increases slightly. Moreover, the \
total CPU time required as reported by 'time dolphin /path/to/large/folder' increases \
significantly when using more than 1 thread. So it appears that contention is really a problem here, and \
the 'size hint' calculation using Qt's text engine stuff is not suitable for parallel \
computation.
Have you tried the "calculate all size hints in all function" approach yet?</pre>
<br />
<p>- Frank</p>
<br />
<p>On September 28th, 2013, 7:07 p.m. UTC, Emmanuel Pescosta wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black solid;"> <tr>
<td>
<div>Review request for Dolphin.</div>
<div>By Emmanuel Pescosta.</div>
<p style="color: grey;"><i>Updated Sept. 28, 2013, 7:07 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
kde-baseapps
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid \
#b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Make use of multi-threading in \
KItemListSizeHintResolver.
We need the item size hints of all items in KItemListLayouter::doLayout(),
so instead of resolving it sequentially during layout calculation,
we can resolve all item size hints in parallel before we start the
layout calculation.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Everything works fine.
It is about 1/3 faster on my machine (tested with 100k items in item view mode).</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>dolphin/src/kitemviews/private/kitemlistsizehintresolver.h <span style="color: \
grey">(486f9b6)</span></li>
<li>dolphin/src/kitemviews/private/kitemlistsizehintresolver.cpp <span style="color: \
grey">(0e2286b)</span></li>
<li>dolphin/src/kitemviews/private/kitemlistviewlayouter.h <span style="color: \
grey">(306fcd3)</span></li>
<li>dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp <span style="color: \
grey">(d6e78ae)</span></li>
</ul>
<p><a href="http://git.reviewboard.kde.org/r/112979/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic