[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Review Request 116692: Lower memory usage of akonadi_baloo_indexer with frequent commits
From:       "Aaron J. Seigo" <aseigo () kde ! org>
Date:       2014-07-10 16:14:58
Message-ID: 20140710161458.19519.54312 () probe ! kde ! org
[Download RAW message or body]

--===============6699442720999278145==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/116692/
-----------------------------------------------------------

(Updated July 10, 2014, 4:14 p.m.)


Status
------

This change has been discarded.


Review request for Akonadi and Baloo.


Repository: baloo


Description
-------

Baloo is using Xapian for storing processed results from data fed to it by akonadi; \
in doing so it processes all the data it is sent to index and only once this is \
complete is the data committed to the Xapian database. From \
http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#acbea2163142de795024880a7123bc693 \
we see: "For efficiency reasons, when performing multiple updates to a database it is \
best (indeed, almost essential) to make as many modifications as memory will permit \
in a single pass through the database. To ensure this, Xapian batches up \
modifications." This means that *all* the data to be stored in the Xapian database \
first ends up in RAM. When indexing large mailboxes (or any other large chunk of \
data) this results in a very large amount of memory allocation. On one test of 100k \
mails in a maildir folder this resulted in 1.5GB of RAM used. In normal daily usage \
with maildir I find that it easily balloons to several hundred megabytes within days. \
This makes the Baloo indexer unusable on systems with smaller amounts of memory (e.g. \
mobile devices, which typically have only 512MB-2GB of RAM)

Making this even worse is that the indexer is both long-lived *and* the default glibc \
allocator is unable to return the used memory back to the OS (probably due to memory \
fragmentation, though I have not confirmed this). Use of other allocators shows the \
temporary ballooning of memory during processing, but once that is done the memory is \
released and returned back to the OS. As such, this is not a memory leak .. but it \
behaves like one on systems with the default glibc allocator with \
akonai_baloo_indexer taking increasingly large amounts of memory on the system that \
never get returned to the OS. (This is actually how I noticed the problem in the \
first place.)

The approach used to address this problem is to periodically commit data to the \
Xapian database. This happens uniformly and transparently to the AbstractIndexer \
subclasses. The exact behavior is controlled by the s_maxUncommittedItems constant \
which is set arbitrarily to 100: after an indexer hits 100 uncommitted changes, the \
results are committed immediately. Caveats:

* This is not a guaranteed fix for the memory fragmentation issue experienced with \
glibc: it is still possible for the memory to grow slowly over time as each smaller \
commit leaves some % of un-releasable memory due to fragmentation. It has helped with \
day to day usage here, but in the "100k mails in a maildir structure" test memory did \
still balloon upwards. 

* It make indexing non-atomic from akonadi's perspective: data fed to \
akonadi_baloo_indexer to be indexed may show up in chunks and even, in the case of a \
crash of the indexer, be only partially added to the database.

Alternative approaches (not necessarily mutually exclusive to this patch or each \
other):

* send smaller data sets from akonadi to akonadi_baloo_indexer for processing. This \
would allow akonadi_baloo_indexer to retain the atomic commit approach while avoiding \
the worst of the Xapian memory usage; it would not address the issue of memory \
                fragmentation
* restart akonadi_baloo_indexer process from time to time; this would resolve the \
fragmentation-over-time issue but not the massive memory usage due to atomically \
                indexing large datasets
* improve Xapian's chert backend (to become default in 1.4) to not fragment memory so \
much; this would not address the issue of massive memory usage due to atomically \
                indexing large datasets
* use an allocator other than glibc's; this would not address the issue of massive \
memory usage due to atomically indexing large datasets


Diffs
-----

  src/pim/agent/emailindexer.cpp 05f80cf 
  src/pim/agent/abstractindexer.h 8ae6f5c 
  src/pim/agent/abstractindexer.cpp fa9e96f 
  src/pim/agent/akonotesindexer.h 83f36b7 
  src/pim/agent/akonotesindexer.cpp ac3e66c 
  src/pim/agent/contactindexer.h 49dfdeb 
  src/pim/agent/contactindexer.cpp a5a6865 
  src/pim/agent/emailindexer.h 9a5e5cf 

Diff: https://git.reviewboard.kde.org/r/116692/diff/


Testing
-------

I have been running with the patch for a couple of days and one other person on irc \
has tested an earlier (but functionally equivalent) version. Rather than reaching the \
common 250MB+ during regular usage it now idles at ~20MB (up from ~7MB when first \
started; so some fragmentation remains as noted in the description, but with far \
better long-term results)


Thanks,

Aaron J. Seigo


--===============6699442720999278145==
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit




<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="12" style="border: 1px #c9c399 \
solid; border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">  \
<tr>  <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://git.reviewboard.kde.org/r/116692/">https://git.reviewboard.kde.org/r/116692/</a>
  </td>
    </tr>
   </table>
   <br />




<table bgcolor="#e0e0e0" width="100%" cellpadding="12" style="border: 1px gray solid; \
border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">  <tr>
  <td>
   <h1 style="margin: 0; padding: 0; font-size: 10pt;">This change has been \
discarded.</h1>  </td>
 </tr>
</table>
<br />


<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="12" style="border: \
1px #888a85 solid; border-radius: 6px; -moz-border-radius: 6px; \
-webkit-border-radius: 6px;">  <tr>
  <td>

<div>Review request for Akonadi and Baloo.</div>
<div>By Aaron J. Seigo.</div>


<p style="color: grey;"><i>Updated July 10, 2014, 4:14 p.m.</i></p>









<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
baloo
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Baloo is using Xapian for storing processed results from data fed to it \
by akonadi; in doing so it processes all the data it is sent to index and only once \
this is complete is the data committed to the Xapian database. From \
http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#acbea2163142de795024880a7123bc693 \
we see: &quot;For efficiency reasons, when performing multiple updates to a database \
it is best (indeed, almost essential) to make as many modifications as memory will \
permit in a single pass through the database. To ensure this, Xapian batches up \
modifications.&quot; This means that *all* the data to be stored in the Xapian \
database first ends up in RAM. When indexing large mailboxes (or any other large \
chunk of data) this results in a very large amount of memory allocation. On one test \
of 1  00k mails in a maildir folder this resulted in 1.5GB of RAM used. In normal \
daily usage with maildir I find that it easily balloons to several hundred megabytes \
within days. This makes the Baloo indexer unusable on systems with smaller amounts of \
memory (e.g. mobile devices, which typically have only 512MB-2GB of RAM)

Making this even worse is that the indexer is both long-lived *and* the default glibc \
allocator is unable to return the used memory back to the OS (probably due to memory \
fragmentation, though I have not confirmed this). Use of other allocators shows the \
temporary ballooning of memory during processing, but once that is done the memory is \
released and returned back to the OS. As such, this is not a memory leak .. but it \
behaves like one on systems with the default glibc allocator with \
akonai_baloo_indexer taking increasingly large amounts of memory on the system that \
never get returned to the OS. (This is actually how I noticed the problem in the \
first place.)

The approach used to address this problem is to periodically commit data to the \
Xapian database. This happens uniformly and transparently to the AbstractIndexer \
subclasses. The exact behavior is controlled by the s_maxUncommittedItems constant \
which is set arbitrarily to 100: after an indexer hits 100 uncommitted changes, the \
results are committed immediately. Caveats:

* This is not a guaranteed fix for the memory fragmentation issue experienced with \
glibc: it is still possible for the memory to grow slowly over time as each smaller \
commit leaves some % of un-releasable memory due to fragmentation. It has helped with \
day to day usage here, but in the &quot;100k mails in a maildir structure&quot; test \
memory did still balloon upwards. 

* It make indexing non-atomic from akonadi&#39;s perspective: data fed to \
akonadi_baloo_indexer to be indexed may show up in chunks and even, in the case of a \
crash of the indexer, be only partially added to the database.

Alternative approaches (not necessarily mutually exclusive to this patch or each \
other):

* send smaller data sets from akonadi to akonadi_baloo_indexer for processing. This \
would allow akonadi_baloo_indexer to retain the atomic commit approach while avoiding \
the worst of the Xapian memory usage; it would not address the issue of memory \
                fragmentation
* restart akonadi_baloo_indexer process from time to time; this would resolve the \
fragmentation-over-time issue but not the massive memory usage due to atomically \
                indexing large datasets
* improve Xapian&#39;s chert backend (to become default in 1.4) to not fragment \
memory so much; this would not address the issue of massive memory usage due to \
                atomically indexing large datasets
* use an allocator other than glibc&#39;s; this would not address the issue of \
massive memory usage due to atomically indexing large datasets</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I have been running with the patch for a couple of days and one other \
person on irc has tested an earlier (but functionally equivalent) version. Rather \
than reaching the common 250MB+ during regular usage it now idles at ~20MB (up from \
~7MB when first started; so some fragmentation remains as noted in the description, \
but with far better long-term results)</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>src/pim/agent/emailindexer.cpp <span style="color: grey">(05f80cf)</span></li>

 <li>src/pim/agent/abstractindexer.h <span style="color: grey">(8ae6f5c)</span></li>

 <li>src/pim/agent/abstractindexer.cpp <span style="color: \
grey">(fa9e96f)</span></li>

 <li>src/pim/agent/akonotesindexer.h <span style="color: grey">(83f36b7)</span></li>

 <li>src/pim/agent/akonotesindexer.cpp <span style="color: \
grey">(ac3e66c)</span></li>

 <li>src/pim/agent/contactindexer.h <span style="color: grey">(49dfdeb)</span></li>

 <li>src/pim/agent/contactindexer.cpp <span style="color: grey">(a5a6865)</span></li>

 <li>src/pim/agent/emailindexer.h <span style="color: grey">(9a5e5cf)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/116692/diff/" style="margin-left: \
3em;">View Diff</a></p>






  </td>
 </tr>
</table>




  </div>
 </body>
</html>


--===============6699442720999278145==--



>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic