[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: Review Request 112979: Make use of multi-threading in KItemListSizeHintResolver
From:       "Frank Reininghaus" <frank78ac () googlemail ! com>
Date:       2013-12-17 17:40:06
Message-ID: 20131217174006.11877.34731 () vidsolbach ! de
[Download RAW message or body]

> On Sept. 30, 2013, 10:04 p.m., Frank Reininghaus wrote:
> > Thanks for looking into this! The layouting procedure is indeed one of \
> > the major bottlenecks when entering a directory, and I was also \
> > thinking that making use of multiple CPU cores is the easiest way for \
> > short-term improvements without major modifications in the layouting \
> > code. Your approach looks good to me - I think I would have also tried \
> > something like that. 
> > I'm a bit surprised/disappointed though that it saves only 1/3 of the \
> > time. In can confirm that observation on my system (with a rather old \
> > dual-core AMD 64X2 5000+ CPU, which has to my knowledge no "Turbo \
> > boost" feature if only one core is busy, which might otherwise have \
> > been an explanation for this). I can currently see two possible \
> > explanations for this: 
> > (a) Could it be that QtConcurrent::blockingMap() adds a lot of overhead \
> > to each function call? Maybe it's designed for the case that each \
> > invocation of the function is expensive (in which case the overhead \
> > wouldn't matter much). However, in our case, a single call of \
> > "itemSizeHint" is not extremely expensive - it's the large number of \
> > calls that is the problem. 
> > But I don't know if that assumption is correct. Maybe \
> > QtConcurrent::blockingMap() is clever enough to split the input list \
> > into only a few large chunks. 
> > 
> > (b) Maybe some parts of the font/text code in Qt, which is called by \
> > KStandardItemListWidgetInformant::itemSizeHint(), are protected by a \
> > mutex internally. Then there could be several causes for performance \
> > loss. If a large part of the time is spent under mutex protection, then \
> > the gain from parallelization is severely limited by Amdahl's Law. \
> > Moreover, if the mutex is locked/released frequently by several \
> > competing threads, then the performance will suffer greatly because of \
> > mutex contention. 
> > 
> > An interesting experiment would be to test your patch on a system with \
> > more than 2 physical cores and check how much it improves the \
> > performance. If anyone with such a system is willing to try this: 
> > * Uncomment the line "// #define KITEMLISTVIEWLAYOUTER_DEBUG" in \
> >                 dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp
> >                 
> > * Either switch on all debug output with kdebugdialog or change the \
> >                 kDebug() in this file to qDebug()
> > * Open a folder with many files (100,000 or more) in Icons View \
> >                 (possibly reload it a couple of times with F5).
> > * Apply Emmanuel's patch and repeat the procedure.
> > 
> > If mutex contention is indeed an issue here, then I would expect that 4 \
> > or more cores will not bring a much larger performance improvement \
> > (compared to the 1/3 that we get with 2 cores), or that it would even \
> > get worse. 
> > 
> > Another idea that I had recently (not tested yet, and orthogonal to \
> > your patch in the sense that it might improve the performance even in \
> > the single-thread case, and that it could be combined with a \
> > QtConcurrent approach to bring further improvements): 
> > Much of the code inside \
> > KStandardItemListWidgetInformant::itemSizeHint() does exactly the same \
> > every time. In Icons View, this is everything except for the \
> > calculation of the number of lines for the item text. To improve this, \
> > one could replace the function 
> > QSizeF KStandardItemListWidgetInformant::itemSizeHint(int index, const \
> > KItemListView* view) const 
> > by
> > 
> > void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF>& \
> > sizeHints, const KItemListView* view) const 
> > or something like that, do most of the things only once, and then \
> > calculate only the number of lines in a loop. 
> > A rough draft:
> > 
> > void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF> \
> > sizeHints, const KItemListView* view) const {
> > //...
> > 
> > switch (static_cast<const KStandardItemListView*>(view)->itemLayout()) \
> > { case KStandardItemListWidget::IconsLayout: {
> > 
> > // Stuff that applies to all items, like...
> > const qreal itemWidth = view->itemSize().width();
> > const qreal maxWidth = itemWidth - 2 * option.padding;
> > const qreal additionalRolesHeight = additionalRolesCount * \
> > option.fontMetrics.lineSpacing(); 
> > for (int i = 0; i < sizeHints.count(); ++i) {
> > if (!sizeHints.at(i).isEmpty()) {
> > continue;
> > }
> > 
> > const QString text = KStringHandler::preProcessWrap(itemText(index, \
> > view)); 
> > // Calculate the number of lines and the "textHeight"
> > //...
> > 
> > const qreal totalHeight = qMin(textHeight + additionalRolesHeight, \
> > maxTextHeight); sizeHints[i] = QSizeF(itemWitdth, totalHeight);
> > }
> > 
> > // ...
> > }
> > }
> > 
> > 
> > 
> > Maybe it's worth trying how much time that saves, and how it can be \
> > combined with your QtConcurrent idea? 
> 
> Emmanuel Pescosta wrote:
> > Maybe it's worth trying how much time that saves, and how it can be \
> > combined with your QtConcurrent idea?
> 
> Sounds great :)
> I'll test it!

I did have a chance to test your patch on a computer with 4 CPU cores, and \
restricted the number of threads by calling

QThreadPool::globalInstance()->setMaxThreadCount(number);

before the QtConcurrent call, with values for 'number' between 1 and 4. \
When using more than 2 threads, the time needed for the layout is not \
reduced further, but increases slightly. Moreover, the total CPU time \
required as reported by 'time dolphin /path/to/large/folder' increases \
significantly when using more than 1 thread. So it appears that contention \
is really a problem here, and the 'size hint' calculation using Qt's text \
engine stuff is not suitable for parallel computation.

Have you tried the "calculate all size hints in all function" approach yet?


- Frank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112979/#review41057
-----------------------------------------------------------


On Sept. 28, 2013, 7:07 p.m., Emmanuel Pescosta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112979/
> -----------------------------------------------------------
> 
> (Updated Sept. 28, 2013, 7:07 p.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Repository: kde-baseapps
> 
> 
> Description
> -------
> 
> Make use of multi-threading in KItemListSizeHintResolver.
> 
> We need the item size hints of all items in \
> KItemListLayouter::doLayout(), so instead of resolving it sequentially \
> during layout calculation,  we can resolve all item size hints in \
> parallel before we start the  layout calculation.
> 
> 
> Diffs
> -----
> 
> dolphin/src/kitemviews/private/kitemlistsizehintresolver.h 486f9b6 
> dolphin/src/kitemviews/private/kitemlistsizehintresolver.cpp 0e2286b 
> dolphin/src/kitemviews/private/kitemlistviewlayouter.h 306fcd3 
> dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp d6e78ae 
> 
> Diff: http://git.reviewboard.kde.org/r/112979/diff/
> 
> 
> Testing
> -------
> 
> Everything works fine.
> 
> It is about 1/3 faster on my machine (tested with 100k items in item view \
> mode). 
> 
> Thanks,
> 
> Emmanuel Pescosta
> 
> 


[Attachment #3 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px \
#c9c399 solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="http://git.reviewboard.kde.org/r/112979/">http://git.reviewboard.kde.org/r/112979/</a>
  </td>
    </tr>
   </table>
   <br />





<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; \
padding-left: 10px;">  <p style="margin-top: 0;">On September 30th, 2013, \
10:04 p.m. UTC, <b>Frank Reininghaus</b> wrote:</p>  <blockquote \
style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Thanks for looking into this! The layouting procedure is \
indeed one of the major bottlenecks when entering a directory, and I was \
also thinking that making use of multiple CPU cores is the easiest way for \
short-term improvements without major modifications in the layouting code. \
Your approach looks good to me - I think I would have also tried something \
like that.

I&#39;m a bit surprised/disappointed though that it saves only 1/3 of the \
time. In can confirm that observation on my system (with a rather old \
dual-core AMD 64X2 5000+ CPU, which has to my knowledge no &quot;Turbo \
boost&quot; feature if only one core is busy, which might otherwise have \
been an explanation for this). I can currently see two possible \
explanations for this:

(a) Could it be that QtConcurrent::blockingMap() adds a lot of overhead to \
each function call? Maybe it&#39;s designed for the case that each \
invocation of the function is expensive (in which case the overhead \
wouldn&#39;t matter much). However, in our case, a single call of \
&quot;itemSizeHint&quot; is not extremely expensive - it&#39;s the large \
number of calls that is the problem.

But I don&#39;t know if that assumption is correct. Maybe \
QtConcurrent::blockingMap() is clever enough to split the input list into \
only a few large chunks.


(b) Maybe some parts of the font/text code in Qt, which is called by \
KStandardItemListWidgetInformant::itemSizeHint(), are protected by a mutex \
internally. Then there could be several causes for performance loss. If a \
large part of the time is spent under mutex protection, then the gain from \
parallelization is severely limited by Amdahl&#39;s Law. Moreover, if the \
mutex is locked/released frequently by several competing threads, then the \
performance will suffer greatly because of mutex contention.


An interesting experiment would be to test your patch on a system with more \
than 2 physical cores and check how much it improves the performance. If \
anyone with such a system is willing to try this:

* Uncomment the line &quot;// #define KITEMLISTVIEWLAYOUTER_DEBUG&quot; in \
                dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp
* Either switch on all debug output with kdebugdialog or change the \
                kDebug() in this file to qDebug()
* Open a folder with many files (100,000 or more) in Icons View (possibly \
                reload it a couple of times with F5).
* Apply Emmanuel&#39;s patch and repeat the procedure.

If mutex contention is indeed an issue here, then I would expect that 4 or \
more cores will not bring a much larger performance improvement (compared \
to the 1/3 that we get with 2 cores), or that it would even get worse.


Another idea that I had recently (not tested yet, and orthogonal to your \
patch in the sense that it might improve the performance even in the \
single-thread case, and that it could be combined with a QtConcurrent \
approach to bring further improvements):

Much of the code inside KStandardItemListWidgetInformant::itemSizeHint() \
does exactly the same every time. In Icons View, this is everything except \
for the calculation of the number of lines for the item text. To improve \
this, one could replace the function

QSizeF KStandardItemListWidgetInformant::itemSizeHint(int index, const \
KItemListView* view) const

by

void KStandardItemListWidgetInformant::itemSizeHint(QVector&lt;QSizeF&gt;&amp; \
sizeHints, const KItemListView* view) const

or something like that, do most of the things only once, and then calculate \
only the number of lines in a loop.

A rough draft:

void KStandardItemListWidgetInformant::itemSizeHint(QVector&lt;QSizeF&gt; \
sizeHints, const KItemListView* view) const {
    //...

    switch (static_cast&lt;const \
KStandardItemListView*&gt;(view)-&gt;itemLayout()) {  case \
KStandardItemListWidget::IconsLayout: {

        // Stuff that applies to all items, like...
        const qreal itemWidth = view-&gt;itemSize().width();
        const qreal maxWidth = itemWidth - 2 * option.padding;
        const qreal additionalRolesHeight = additionalRolesCount * \
option.fontMetrics.lineSpacing();

        for (int i = 0; i &lt; sizeHints.count(); ++i) {
            if (!sizeHints.at(i).isEmpty()) {
                continue;
            }

            const QString text = \
KStringHandler::preProcessWrap(itemText(index, view));

            // Calculate the number of lines and the &quot;textHeight&quot;
            //...

            const qreal totalHeight = qMin(textHeight + \
additionalRolesHeight, maxTextHeight);  sizeHints[i] = QSizeF(itemWitdth, \
totalHeight);  }

    // ...
    }
}

 

Maybe it&#39;s worth trying how much time that saves, and how it can be \
combined with your QtConcurrent idea? </pre>
 </blockquote>




 <p>On October 4th, 2013, noon UTC, <b>Emmanuel Pescosta</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; \
padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">&gt; Maybe it&#39;s worth trying how much time that saves, and \
how it can be combined with your QtConcurrent idea?

Sounds great :)
I&#39;ll test it!</pre>
 </blockquote>








</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I did have a \
chance to test your patch on a computer with 4 CPU cores, and restricted \
the number of threads by calling

QThreadPool::globalInstance()-&gt;setMaxThreadCount(number);

before the QtConcurrent call, with values for &#39;number&#39; between 1 \
and 4. When using more than 2 threads, the time needed for the layout is \
not reduced further, but increases slightly. Moreover, the total CPU time \
required as reported by &#39;time dolphin /path/to/large/folder&#39; \
increases significantly when using more than 1 thread. So it appears that \
contention is really a problem here, and the &#39;size hint&#39; \
calculation using Qt&#39;s text engine stuff is not suitable for parallel \
computation.

Have you tried the &quot;calculate all size hints in all function&quot; \
approach yet?</pre> <br />










<p>- Frank</p>


<br />
<p>On September 28th, 2013, 7:07 p.m. UTC, Emmanuel Pescosta wrote:</p>








<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px \
black solid;">  <tr>
  <td>

<div>Review request for Dolphin.</div>
<div>By Emmanuel Pescosta.</div>


<p style="color: grey;"><i>Updated Sept. 28, 2013, 7:07 p.m.</i></p>









<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
kde-baseapps
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description \
</h1>  <table width="100%" bgcolor="#ffffff" cellspacing="0" \
cellpadding="10" style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Make use of multi-threading in KItemListSizeHintResolver.

We need the item size hints of all items in KItemListLayouter::doLayout(),
so instead of resolving it sequentially during layout calculation, 
we can resolve all item size hints in parallel before we start the 
layout calculation.</pre>
  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing \
</h1> <table width="100%" bgcolor="#ffffff" cellspacing="0" \
cellpadding="10" style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Everything works fine.

It is about 1/3 faster on my machine (tested with 100k items in item view \
mode).</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> \
</h1> <ul style="margin-left: 3em; padding-left: 0;">

 <li>dolphin/src/kitemviews/private/kitemlistsizehintresolver.h <span \
style="color: grey">(486f9b6)</span></li>

 <li>dolphin/src/kitemviews/private/kitemlistsizehintresolver.cpp <span \
style="color: grey">(0e2286b)</span></li>

 <li>dolphin/src/kitemviews/private/kitemlistviewlayouter.h <span \
style="color: grey">(306fcd3)</span></li>

 <li>dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp <span \
style="color: grey">(d6e78ae)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/112979/diff/" \
style="margin-left: 3em;">View Diff</a></p>







  </td>
 </tr>
</table>








  </div>
 </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic