[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Review Request: KImageCache optimization
From:       "Michael Pyne" <mpyne () kde ! org>
Date:       2012-02-24 1:16:41
Message-ID: 20120224011641.25188.53974 () vidsolbach ! de
[Download RAW message or body]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/104052/#review10858
-----------------------------------------------------------


The big issue is a race condition between finding an entry and its eventual=
 use later... this might be suitable for limited use-cases where end-user c=
ode can buy into that arrangement, but is not suitable in general for KShar=
edDataCache or KImageCache.

To answer a different comment, the cache is always fixed-size because of th=
e design of the underlying data structure (currently a fixed-size hash tabl=
e). The entire structure is pre-allocated on disk for architectures that su=
pport posix_fallocate (IIRC) to avoid crashes for OS'es that overcommit dis=
k space.


kdecore/util/kshareddatacache.h
<http://git.reviewboard.kde.org/r/104052/#comment8785>

    As you note here in the comment, there is a race condition in general b=
etween unlocking the cache and actually finally copying the data to the eve=
ntual destination.
    =

    Although I deeply appreciate the need for core code like KSharedDataCac=
he to be as efficient as possible, this can't be done by adding even more s=
egfaults (however rare).
    =

    One possible alternative that might be possible and keep your core idea=
 is to possibly find a way to invalidate those "fromRawData"'d QByteArrays =
if the cache (or better yet, the affected data pages) is modified in betwee=
n, similar to how QPointer<T> knows when the QObject it's pointing to has b=
een deleted. Obviously this is easier said than done for a low-level class =
like QByteArray.



kdecore/util/kshareddatacache.cpp
<http://git.reviewboard.kde.org/r/104052/#comment8786>

    The cache is unlocked here but temp is still pointing into its internal=
s (as mentioned above)... it's possible to extract the lock out to wrap aro=
und this copy too but then I'm not sure how that would make anything differ=
ent, you'd still be performing the copy.



kdeui/util/kimagecache.cpp
<http://git.reviewboard.kde.org/r/104052/#comment8787>

    Is this still needed? (I'm assuming not. ;)



kdeui/util/kimagecache.cpp
<http://git.reviewboard.kde.org/r/104052/#comment8788>

    Like some of the other reviewers I would prefer to maintain at least so=
me compression of the image (since in general CPU is nowadays cheaper than =
memory and memory bandwidth), in addition to the concern with indexed/palet=
ted images (from the Qt docs it seems the color table will still be left em=
pty here).
    =

    Of course, the reason the QImage must be constructed that way is mostly=
 related to being able to use the rawData QByteArray directly, which we can=
't really do without the cache locked... assuming we can find a way around =
that this construct might still be useful for QImages which we verify have =
no color table.


- Michael Pyne


On Feb. 23, 2012, 7:23 p.m., Mark Gaiser wrote:
> =

> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/104052/
> -----------------------------------------------------------
> =

> (Updated Feb. 23, 2012, 7:23 p.m.)
> =

> =

> Review request for kdelibs, David Faure and Michael Pyne.
> =

> =

> Description
> -------
> =

> I was running KWin through callgrind to see where possible bottlenecks ar=
e. I wasn't expecting much since it improved greatly during the 4.8 dev cyc=
le, however one stood out. The saving of PNG images was taking about 1/5th =
of the time in KWin that i could see directly. That looked like something i=
 might be able to optimize.
> =

> What this patch is doing is storing the actual image bits to prevent savi=
ng a PNG image to the mmapped cache. That was a hot code path in time (cycl=
es), not even in calls. I've also reduced the amount of memory copies to a =
bare minimum by adding a rawFind function to KSharedDataCache which fills a=
 QByteArray::fromRawData thus preventing a expensive memory copy. The rawFi=
nd is used for looking up an image and fetching it's data without copying i=
t. That is done because QImage seems to make a copy itself internally. I do=
n't have any performance measurements, however, prior to this patch my kwin=
 test was using up ~5.000.000.000 cycles. After this patch it's using up 1.=
370.000.000. I don't have raw performance numbers to see if the cache itsel=
f is actually faster, it certainly has become a lot cheaper to use the cach=
e. Logic wise i would say creating a QImage from the cached data should be =
way faster now since there is no step involved anymore in decoding the imag=
e. Storing is certainly an order of magnitude faster.
> =

> Benchmark numbers. insert(write) and find(read)
> =

> -- Before patch --
> READ : 0.019 msecs per iteration (total: 79, iterations: 4096)
> WRITE: 0.010 msecs per iteration (total: 88, iterations: 8192)
> =

> -- After patch --
> READ : 0.019 msecs per iteration (total: 79, iterations: 4096)
> WRITE: 0.0026 msecs per iteration (total: 87, iterations: 32768)
> =

> Reading is equal in speed, writing is ~5x faster after the patch.
> =

> Special thanks go to David Faure for helping me a great deal with this.
> =

> =

> Diffs
> -----
> =

>   kdecore/util/kshareddatacache.h 339cecc =

>   kdecore/util/kshareddatacache.cpp 9fe3995 =

>   kdeui/tests/CMakeLists.txt 63788f6 =

>   kdeui/tests/kimagecachetests.h PRE-CREATION =

>   kdeui/tests/kimagecachetests.cpp PRE-CREATION =

>   kdeui/util/kimagecache.cpp a5bbbe1 =

> =

> Diff: http://git.reviewboard.kde.org/r/104052/diff/
> =

> =

> Testing
> -------
> =

> I've also written a bunch of test cases (greatly improved by David Faure)=
 to see if i didn't break anything. According to the test (which is also co=
mparing the actual image bits) it's all passing just fine.
> =

> =

> Thanks,
> =

> Mark Gaiser
> =

>


[Attachment #3 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="http://git.reviewboard.kde.org/r/104052/">http://git.reviewboard.kde.org/r/104052/</a>
  </td>
    </tr>
   </table>
   <br />





 <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">The big issue is a race \
condition between finding an entry and its eventual use later... this might be \
suitable for limited use-cases where end-user code can buy into that arrangement, but \
is not suitable in general for KSharedDataCache or KImageCache.

To answer a different comment, the cache is always fixed-size because of the design \
of the underlying data structure (currently a fixed-size hash table). The entire \
structure is pre-allocated on disk for architectures that support posix_fallocate \
(IIRC) to avoid crashes for OS&#39;es that overcommit disk space.</pre>  <br />





<div>




<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; \
border-collapse: collapse; margin: 2px padding: 2px;">  <thead>
  <tr>
   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; \
font-size: 9pt; padding: 4px 8px; text-align: left;">  <a \
href="http://git.reviewboard.kde.org/r/104052/diff/2/?file=50862#file50862line144" \
style="color: black; font-weight: bold; text-decoration: \
underline;">kdecore/util/kshareddatacache.h</a>  <span style="font-weight: normal;">

     (Diff revision 2)

    </span>
   </th>
  </tr>
 </thead>

 <tbody style="background-color: #e4d9cb; padding: 4px 8px; text-align: center;">
  <tr>

   <td colspan="4"><pre style="font-size: 8pt; line-height: 140%; margin: 0; \
">public:</pre></td>

  </tr>
 </tbody>




 
 



 <tbody>

  <tr>
    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" \
align="right"><font size="2"></font></th>  <td bgcolor="#c5ffc4" width="50%"><pre \
style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>  <th \
bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid \
#C0C0C0;" align="right"><font size="2">144</font></th>  <td bgcolor="#c5ffc4" \
width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "><span \
class="cm">     * internal copy. You must use the contents of the destination \
bytearray before</span></pre></td>  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">As you note \
here in the comment, there is a race condition in general between unlocking the cache \
and actually finally copying the data to the eventual destination.

Although I deeply appreciate the need for core code like KSharedDataCache to be as \
efficient as possible, this can&#39;t be done by adding even more segfaults (however \
rare).

One possible alternative that might be possible and keep your core idea is to \
possibly find a way to invalidate those &quot;fromRawData&quot;&#39;d QByteArrays if \
the cache (or better yet, the affected data pages) is modified in between, similar to \
how QPointer&lt;T&gt; knows when the QObject it&#39;s pointing to has been deleted. \
Obviously this is easier said than done for a low-level class like QByteArray.</pre> \
</div> <br />

<div>




<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; \
border-collapse: collapse; margin: 2px padding: 2px;">  <thead>
  <tr>
   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; \
font-size: 9pt; padding: 4px 8px; text-align: left;">  <a \
href="http://git.reviewboard.kde.org/r/104052/diff/2/?file=50863#file50863line1491" \
style="color: black; font-weight: bold; text-decoration: \
underline;">kdecore/util/kshareddatacache.cpp</a>  <span style="font-weight: \
normal;">

     (Diff revision 2)

    </span>
   </th>
  </tr>
 </thead>

 <tbody style="background-color: #e4d9cb; padding: 4px 8px; text-align: center;">
  <tr>

   <td colspan="4"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">bool \
KSharedDataCache::insert(const QString &amp;key, const QByteArray \
&amp;data)</pre></td>

  </tr>
 </tbody>




 
 



 <tbody>

  <tr>
    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" \
align="right"><font size="2"></font></th>  <td bgcolor="#c5ffc4" width="50%"><pre \
style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>  <th \
bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid \
#C0C0C0;" align="right"><font size="2">1491</font></th>  <td bgcolor="#c5ffc4" \
width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">            \
<span class="o">*</span><span class="n">destination</span> <span class="o">=</span> \
<span class="n">QByteArray</span><span class="p">(</span><span \
class="n">temp</span><span class="p">.</span><span class="n">constData</span><span \
class="p">(),</span> <span class="n">temp</span><span class="p">.</span><span \
class="n">size</span><span class="p">());</span></pre></td>  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">The cache \
is unlocked here but temp is still pointing into its internals (as mentioned \
above)... it&#39;s possible to extract the lock out to wrap around this copy too but \
then I&#39;m not sure how that would make anything different, you&#39;d still be \
performing the copy.</pre> </div>
<br />

<div>




<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; \
border-collapse: collapse; margin: 2px padding: 2px;">  <thead>
  <tr>
   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; \
font-size: 9pt; padding: 4px 8px; text-align: left;">  <a \
href="http://git.reviewboard.kde.org/r/104052/diff/2/?file=50867#file50867line28" \
style="color: black; font-weight: bold; text-decoration: \
underline;">kdeui/util/kimagecache.cpp</a>  <span style="font-weight: normal;">

     (Diff revision 2)

    </span>
   </th>
  </tr>
 </thead>

 <tbody style="background-color: #e4d9cb; padding: 4px 8px; text-align: center;">
  <tr>

   <td colspan="4"><pre style="font-size: 8pt; line-height: 140%; margin: 0; \
"></pre></td>

  </tr>
 </tbody>




 
 



 <tbody>

  <tr>
    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" \
align="right"><font size="2"></font></th>  <td bgcolor="#c5ffc4" width="50%"><pre \
style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>  <th \
bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid \
#C0C0C0;" align="right"><font size="2">28</font></th>  <td bgcolor="#c5ffc4" \
width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "><span \
class="cp">#include &lt;QDebug&gt;</span></pre></td>  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Is this \
still needed? (I&#39;m assuming not. ;)</pre> </div>
<br />

<div>




<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; \
border-collapse: collapse; margin: 2px padding: 2px;">  <thead>
  <tr>
   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; \
font-size: 9pt; padding: 4px 8px; text-align: left;">  <a \
href="http://git.reviewboard.kde.org/r/104052/diff/2/?file=50867#file50867line103" \
style="color: black; font-weight: bold; text-decoration: \
underline;">kdeui/util/kimagecache.cpp</a>  <span style="font-weight: normal;">

     (Diff revision 2)

    </span>
   </th>
  </tr>
 </thead>

 <tbody style="background-color: #e4d9cb; padding: 4px 8px; text-align: center;">
  <tr>

   <td colspan="4"><pre style="font-size: 8pt; line-height: 140%; margin: 0; \
">KImageCache::~KImageCache()</pre></td>

  </tr>
 </tbody>





 
 


 <tbody>

  <tr>
    <th bgcolor="#ebb1ba" style="border-right: 1px solid #C0C0C0;" \
align="right"><font size="2">101</font></th>  <td bgcolor="#ffc5ce" width="50%"><pre \
style="font-size: 8pt; line-height: 140%; margin: 0; ">    <span \
class="n">image</span><span class="p">.</span><span class="n">save</span><span \
class="p">(</span><span class="o">&amp;</span><span class="n">buffer</span><span \
class="p">,</span> <span class="s">&quot;PNG&quot;</span><span \
class="p">);</span></pre></td>  <th bgcolor="#ebb1ba" style="border-left: 1px solid \
#C0C0C0; border-right: 1px solid #C0C0C0;" align="right"><font size="2"></font></th>  \
<td bgcolor="#ffc5ce" width="50%"><pre style="font-size: 8pt; line-height: 140%; \
margin: 0; "></pre></td>  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Like some \
of the other reviewers I would prefer to maintain at least some compression of the \
image (since in general CPU is nowadays cheaper than memory and memory bandwidth), in \
addition to the concern with indexed/paletted images (from the Qt docs it seems the \
color table will still be left empty here).

Of course, the reason the QImage must be constructed that way is mostly related to \
being able to use the rawData QByteArray directly, which we can&#39;t really do \
without the cache locked... assuming we can find a way around that this construct \
might still be useful for QImages which we verify have no color table.</pre> </div>
<br />



<p>- Michael</p>


<br />
<p>On February 23rd, 2012, 7:23 p.m., Mark Gaiser wrote:</p>






<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/media/rb/images/review_request_box_top_bg.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for kdelibs, David Faure and Michael Pyne.</div>
<div>By Mark Gaiser.</div>


<p style="color: grey;"><i>Updated Feb. 23, 2012, 7:23 p.m.</i></p>






<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I was running KWin through callgrind to see where possible bottlenecks \
are. I wasn&#39;t expecting much since it improved greatly during the 4.8 dev cycle, \
however one stood out. The saving of PNG images was taking about 1/5th of the time in \
KWin that i could see directly. That looked like something i might be able to \
optimize.

What this patch is doing is storing the actual image bits to prevent saving a PNG \
image to the mmapped cache. That was a hot code path in time (cycles), not even in \
calls. I&#39;ve also reduced the amount of memory copies to a bare minimum by adding \
a rawFind function to KSharedDataCache which fills a QByteArray::fromRawData thus \
preventing a expensive memory copy. The rawFind is used for looking up an image and \
fetching it&#39;s data without copying it. That is done because QImage seems to make \
a copy itself internally. I don&#39;t have any performance measurements, however, \
prior to this patch my kwin test was using up ~5.000.000.000 cycles. After this patch \
it&#39;s using up 1.370.000.000. I don&#39;t have raw performance numbers to see if \
the cache itself is actually faster, it certainly has become a lot cheaper to use the \
cache. Logic wise i would say creating a QImage from the cached data should be way \
faster now since there is no step involved anymore in decoding the image. Storing is \
certainly an order of magnitude faster.

Benchmark numbers. insert(write) and find(read)

-- Before patch --
READ : 0.019 msecs per iteration (total: 79, iterations: 4096)
WRITE: 0.010 msecs per iteration (total: 88, iterations: 8192)

-- After patch --
READ : 0.019 msecs per iteration (total: 79, iterations: 4096)
WRITE: 0.0026 msecs per iteration (total: 87, iterations: 32768)

Reading is equal in speed, writing is ~5x faster after the patch.

Special thanks go to David Faure for helping me a great deal with this.</pre>
  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I&#39;ve also written a bunch of test cases (greatly improved by David \
Faure) to see if i didn&#39;t break anything. According to the test (which is also \
comparing the actual image bits) it&#39;s all passing just fine.</pre>  </td>
 </tr>
</table>




<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>kdecore/util/kshareddatacache.h <span style="color: grey">(339cecc)</span></li>

 <li>kdecore/util/kshareddatacache.cpp <span style="color: \
grey">(9fe3995)</span></li>

 <li>kdeui/tests/CMakeLists.txt <span style="color: grey">(63788f6)</span></li>

 <li>kdeui/tests/kimagecachetests.h <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>kdeui/tests/kimagecachetests.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>kdeui/util/kimagecache.cpp <span style="color: grey">(a5bbbe1)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/104052/diff/" style="margin-left: \
3em;">View Diff</a></p>




  </td>
 </tr>
</table>








  </div>
 </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic