[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Review Request 120438: improve baloo file indexing overhead by ~2.5x
From:       "Vishesh Handa" <me () vhanda ! in>
Date:       2014-10-14 10:49:17
Message-ID: 20141014104917.21223.83601 () probe ! kde ! org
[Download RAW message or body]

--===============1829551522856473575==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit



> On Sept. 30, 2014, 4:25 p.m., Vishesh Handa wrote:
> > Hey Aaron. Thanks for putting this up for review. Could you please split this \
> > patch into multiple reviews? I see a mix of this patch, xapian->sqlite changes, \
> > new API, coding styles changes, and some improvements in the FileMapping class. 
> > The indexing progress seems to be now tracked in sqlite instead of Xapian. You \
> > mentioned it is more performant. Do you know why that is? Also, I'd love to know \
> > how these benchmarks were measured. I don't see any tests for benchmarks. Perhaps \
> > you forgot to add them to this patch? 
> > And finally, from what I understand this removed the need for the whole binary \
> > search when a file fails to index. Could you please add a test in order to make \
> > sure the extractor crashing is handled properly? Some tests for the file indexing \
> > queue would also be needed. 
> > How do you handle when a file takes too long to index? The FileIndexingJob had a \
> > timer for 5 minutes.
> 
> Aaron J. Seigo wrote:
> Sorry I haven't responded sooner but I was away for a week and have been in a job \
> transition since then, and that's kept me occupied the last while. I'll hopefully \
> have some time again to hack on personal / fun things opening up by week's end. 
> "Could you please split this patch into multiple reviews? I see a mix of this \
> patch, xapian->sqlite changes, new API, coding styles changes, and some \
> improvements in the FileMapping class." 
> The code style fixes are limited two small places. The only one that is entirely \
> separate from functional changes is in indexingqueue.cpp. If you really want those \
> bits reverted, I can do so, but I'm not going to submit a separate review for them \
> on their own. Realistically, it's 6 curly braces (by my count) ...  
> The rest of the changes belong together. The xapian->sqlite changes are necessary \
> for the cross-process queue syncronization, the new API is the cross-process \
> communication and the other changes all support those changes. Example: the sqlite \
> db schema versioning is needed to perform the upgrade to the schema; without that \
> ther would be code there to check if there are specific columns or not, and that \
> would then get replaced by a more sane versioning (such as what is in this patch). \
> There's really not much to break up further. 
> "The indexing progress seems to be now tracked in sqlite instead of Xapian."
> 
> Correct. The booleans stored in the xapian db as Z1 and Z2 were pretty obscure \
> (what does 'Z1' mean?) and the sqlite db already has a listing of the files being \
> indexed so is a natural place to track progress asyncronously between processes \
> without requiring the xapian db to commit continuously. 
> "You mentioned it is more performant. Do you know why that is?"
> 
> Yes (see above)
> 
> "Also, I'd love to know how these benchmarks were measured."
> 
> The schedulertest. Handy as both an integration test and a small benchmark. \
> Extending it to create more than 20 files per run is pretty simple. 
> "Could you please add a test in order to make sure the extractor crashing is \
> handled properly?" 
> Sure. (Figuring out how to simulate it crashing, since it really shouldn't ever do \
> that, will be fun and interesting ... but I'm sure it can be done) 
> "Some tests for the file indexing queue would also be needed."
> 
> What needs testing beyond what schedulertest already tests?
> 
> "How do you handle when a file takes too long to index?"
> 
> It doesn't currently; I'll add that.

Of course, creating a new review for the coding style changes would be too much. But \
it should go in a separate commit. I would not like your entire branch to be merged, \
rather one commit at a time.

Could you please explain "xapian->sqlite changes are necessary for cross-process \
queue synchronization". I'm afraid I really don't follow. Also, I don't follow why \
they are not functionally seperate. 

Regarding the booleans. Z was a random letter chosen to represent the indexing level. \
One could have chosen a longer string, but that would occupy more space. Is there a \
problem with having to commit changes in Xapian? We are indexing the file and making \
changes over there, is not more sensible to just club them together? Unless, there \
are provable speed and io improvments by storing them in sqlite.

Regarding the Scheduler Test: Each change should ideally be tested independently so \
that we know what parts are actually improving the system and by how much. Also, the \
scheduler test isn't ideal since it contains artificial delays in the queues. We have \
them because indexing everything too fast is a problem.

Regarding Test Crashing - The previous FileIndexingQueue test already had code for \
that.

Test coverage. At minimum, the following tests would be required -
1. Extractor Client - Handling crashing and timeouts. Maybe tests for the FileIQ \
would also be required since it is now updating the 2. Benchmarks for the \
FileIndexingQueue (both io and cpu) 3. Benchmarks for the Scheduler (both io and cpu)
4. The BasicIndexingJob test should still be there - I understand that you've moved \
some part of the code to the BasicIQ, please add relevant tests.

I would be very interested in seeing both the IO and CPU benchmarks. More in the IO \
ones since the speed benefits while nice to have, do not amount to much if we have \
artifical delays.


> On Sept. 30, 2014, 4:25 p.m., Vishesh Handa wrote:
> > .gitignore, line 1
> > <https://git.reviewboard.kde.org/r/120438/diff/1/?file=315778#file315778line1>
> > 
> > Is this required?
> 
> Aaron J. Seigo wrote:
> no; doesn't have to be there. happy to leave it out.

Please do. These kind of things belong in a global gitignore file.


> On Sept. 30, 2014, 4:25 p.m., Vishesh Handa wrote:
> > src/file/lib/extractorclient.h, line 30
> > <https://git.reviewboard.kde.org/r/120438/diff/1/?file=315808#file315808line30>
> > 
> > This add new public API.
> 
> Aaron J. Seigo wrote:
> Which makes sense if it is used by the Baloo widgets repository, unless they can \
> share code privately. Otherwise, yes, it's fine to make this private if nothing \
> else is going to use it.

Baloo Widgets no longer uses the extractor. This can be made private.


- Vishesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/120438/#review67687
-----------------------------------------------------------


On Sept. 30, 2014, 12:47 p.m., Aaron J. Seigo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/120438/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 12:47 p.m.)
> 
> 
> Review request for Baloo.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Current Baloo's file indexer takes batches of files by its command line and \
> processes them async. There are several iterations over and creation of various \
> collection classes. This patch set does away with all of this. 
> Instead, the worker process now is directed over stdin and responds over stdout \
> (old school UNIX!). This is done using a very simple protocol which is fully \
> documented in docs/ and is encapsulated in a convenience class so as to make future \
> maintenance relatively painless in the advent that the protocol is adjusted or an \
> entirely new system is put inplace. IOW, the next me should have an easier time \
> improving on what is there. This allows the worker process to be long(er)-lived and \
> moves more of the relevant bookkeeping to the same place (the worker process). The \
> resulting code is about the same size but cleaner and much more performant. 
> The tracking of file progress is also moved to the (less expensive, easier to \
> update incrementally) sqlite db where file data is already kept. It also versions \
> the sqlite schema so that runtime upgrades can be made (as required here) both now \
> and in the future without user intervention. 
> Finally, a tiny (93 LOC, including header) executable that wraps the protocol \
> encapsulation class is provided for 100% backwards compatibility of *all* command \
> line options that currently exist. 
> This makes it a 100% drop-in replacement.
> 
> Performance of the affected unit tests is ~2.5x better on my laptop: 7.6 v 3.0 \
> seconds. 
> Known weaknesses:
> 
> * still isn't hardened against multiple processes indexing simultaneously. the \
>                 added sqlite columns are a step in this direction, howeer
> * the DBus signal that was being emitted was removed and has not been replaced with \
> anything yet. That is because the affected class is in baloo-widgets (which happens \
> to be the only user of the relevant *public* class in the baloo file library) and I \
> have not yet worked on that, pending this patch set being approved in principle. I \
> am actually of the suspicion that keeping that data synchronized at all times is \
> not needed .. but if it is, there are much better ways than DBus. Note that the \
> DBus overhead was applied to *every single file indexed* even if Dolphin (the only \
> user of this feature) wasn't running or had the metadata frame open. 
> The base idea behind this patch is to make file indexing as blazing fast as \
> possible, which means using as few resources, CPU & memory, as possible. That ought \
> to be the #1 priority for Baloo when it comes to indexing files, both to keep it \
> away from user's notice on desktop class hardware as well as extend its relevance \
> to smaller devices. The code in this patch is at least as clear (imho more so, \
> actually) than what it replaces and was written for maintainability. 
> 
> Diffs
> -----
> 
> src/file/extractor/app.cpp PRE-CREATION 
> src/file/extractor/baloo_file_extractor.h a8cff51ae763c36fb2934c80c04cbf0f710a01c7 
> src/file/extractor/baloo_file_extractor.cpp \
> b2e7f97f2f0537807c314e2aca008d67711a69ce  src/file/extractor/extractorworker.cpp \
> e710908dc8b02b7edc9f40e44b9d5d5883986846  src/file/extractor/main.cpp \
> 606a1eb9d42b7079fee497a3fdffbe2327512c13  src/file/fileindexingjob.h PRE-CREATION 
> src/file/fileindexingjob.cpp PRE-CREATION 
> src/file/fileindexingqueue.h 5b1c612feb84046acf36b3dec3bf890da2cbf650 
> src/file/fileindexingqueue.cpp c6479cd25308bee9bb6fe6e6791391e6ca83bddf 
> src/file/indexingqueue.cpp c8ae4d9d1436ebe222a4758bd5ce551e091ec714 
> src/file/lib/CMakeLists.txt fb158a5f00f07ab48de963bf72924c83fc1c6673 
> src/file/lib/extractorclient.h 1e2ae09c7bd0068ca185b814c5c4996c0b563856 
> src/file/lib/extractorclient.cpp 221b90abe1899c27c03cf77e279fb71afce80b4d 
> src/file/lib/filemapping.cpp 5769fe23576c341917e8f37f7dd2743dd446a6e1 
> src/file/main.cpp 41855215c7b550c403d5ef523d29f9b73db58cb0 
> src/file/metadatamover.cpp 5117687c1247275235c3830b9d2234bfff353748 
> src/file/util.h a4cc2423fd46cda7146b02c572cc3f4ccd069e3e 
> src/file/util.cpp 2f7ac6de0dddc7d71c2b36bb4fe88677bfe52336 
> src/file/autotest/fileindexingjob/fileindexingjobtest.h PRE-CREATION 
> src/file/autotest/fileindexingjob/fileindexingjobtest.cpp PRE-CREATION 
> src/file/autotest/scheduler/schedulertest.cpp \
> b91a481138382b609068b4737c654ac5182c37ec  src/file/basicindexingjob.h \
> 3d2fb339e38be862332ddca45b4926f8000fa0c1  src/file/basicindexingjob.cpp \
> 8d70fa2b1ca737eaa8d2e717a0eb2dbf430fac84  src/file/basicindexingqueue.cpp \
> 8484c9b6868890cae6c2e19b29ea92d500aac03e  src/file/commitqueue.cpp \
> b26f73cfdbc390cceef01aa6abd2b5af6f9689b2  src/file/database.h \
> 6548de7dbce003112dad8c58c39c624dd806a261  src/file/database.cpp \
> 5c9ccdc442d230bdc5dc9edbe5bc595537d08a8f  src/file/extractor/CMakeLists.txt \
> 7cf3478f547f7fbcdf8908a3471b50fab09b5220  src/file/extractor/app.h \
>                 8059a16e8ec2e2a84d1a3b7eb1dd8cfc2a3136de 
> .gitignore 1377554ebea6f98a2c748183bc5a96852af12ac2 
> docs/file_extractor.md 3d830e2313bd35994f9fd1d6b52df422d0556fd8 
> src/file/CMakeLists.txt 1a9e6bf8f2c70b096fbb2c6c9eb78feec9d06622 
> src/file/autotest/CMakeLists.txt ccb8575fa8c07f2b136d259ed63d43a09ad75a12 
> src/file/autotest/basicindexingjobtest.h PRE-CREATION 
> src/file/autotest/basicindexingjobtest.cpp PRE-CREATION 
> src/file/autotest/fileindexingjob/CMakeLists.txt PRE-CREATION 
> src/file/autotest/fileindexingjob/extractor.cpp PRE-CREATION 
> 
> Diff: https://git.reviewboard.kde.org/r/120438/diff/
> 
> 
> Testing
> -------
> 
> Indexed files, ran testsuite with 0 failures.
> 
> 
> Thanks,
> 
> Aaron J. Seigo
> 
> 


--===============1829551522856473575==
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit




<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="12" style="border: 1px #c9c399 \
solid; border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">  \
<tr>  <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://git.reviewboard.kde.org/r/120438/">https://git.reviewboard.kde.org/r/120438/</a>
  </td>
    </tr>
   </table>
   <br />





<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On September 30th, 2014, 4:25 p.m. UTC, <b>Vishesh \
Handa</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Hey Aaron. Thanks for putting this up for review. \
Could you please split this patch into multiple reviews? I see a mix of this patch, \
xapian-&gt;sqlite changes, new API, coding styles changes, and some improvements in \
the FileMapping class.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">The indexing progress seems to be now \
tracked in sqlite instead of Xapian. You mentioned it is more performant. Do you know \
why that is? Also, I'd love to know how these benchmarks were measured. I don't see \
any tests for benchmarks. Perhaps you forgot to add them to this patch?</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">And finally, from what I understand this removed the need for the whole \
binary search when a file fails to index. Could you please add a test in order to \
make sure the extractor crashing is handled properly? Some tests for the file \
indexing queue would also be needed.</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">How do you handle when \
a file takes too long to index? The FileIndexingJob had a timer for 5 \
minutes.</p></pre>  </blockquote>




 <p>On October 13th, 2014, 8:27 p.m. UTC, <b>Aaron J. Seigo</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Sorry \
I haven't responded sooner but I was away for a week and have been in a job \
transition since then, and that's kept me occupied the last while. I'll hopefully \
have some time again to hack on personal / fun things opening up by week's end.</p> \
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">"Could you please split this patch into multiple \
reviews? I see a mix of this patch, xapian-&gt;sqlite changes, new API, coding styles \
changes, and some improvements in the FileMapping class."</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">The \
code style fixes are limited two small places. The only one that is entirely separate \
from functional changes is in indexingqueue.cpp. If you really want those bits \
reverted, I can do so, but I'm not going to submit a separate review for them on \
their own. Realistically, it's 6 curly braces (by my count) ... </p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">The rest of the changes belong together. The xapian-&gt;sqlite changes are \
necessary for the cross-process queue syncronization, the new API is the \
cross-process communication and the other changes all support those changes. Example: \
the sqlite db schema versioning is needed to perform the upgrade to the schema; \
without that ther would be code there to check if there are specific columns or not, \
and that would then get replaced by a more sane versioning (such as what is in this \
patch). There's really not much to break up further.</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">"The \
indexing progress seems to be now tracked in sqlite instead of Xapian."</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">Correct. The booleans stored in the xapian db as Z1 and Z2 were pretty \
obscure (what does 'Z1' mean?) and the sqlite db already has a listing of the files \
being indexed so is a natural place to track progress asyncronously between processes \
without requiring the xapian db to commit continuously.</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">"You \
mentioned it is more performant. Do you know why that is?"</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Yes \
(see above)</p> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">"Also, I'd love to know how these benchmarks were \
measured."</p> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">The schedulertest. Handy as both an integration test \
and a small benchmark. Extending it to create more than 20 files per run is pretty \
simple.</p> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">"Could you please add a test in order to make sure the \
extractor crashing is handled properly?"</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">Sure. (Figuring out how \
to simulate it crashing, since it really shouldn't ever do that, will be fun and \
interesting ... but I'm sure it can be done)</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">"Some tests for the \
file indexing queue would also be needed."</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">What needs testing \
beyond what schedulertest already tests?</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">"How do you handle when \
a file takes too long to index?"</p> <p style="padding: 0;text-rendering: \
inherit;margin: 0;line-height: inherit;white-space: inherit;">It doesn't currently; \
I'll add that.</p></pre>  </blockquote>








</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Of \
course, creating a new review for the coding style changes would be too much. But it \
should go in a separate commit. I would not like your entire branch to be merged, \
rather one commit at a time.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">Could you please explain \
"xapian-&gt;sqlite changes are necessary for cross-process queue synchronization". \
I'm afraid I really don't follow. Also, I don't follow why they are not functionally \
seperate. </p> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Regarding the booleans. Z was a random letter chosen \
to represent the indexing level. One could have chosen a longer string, but that \
would occupy more space. Is there a problem with having to commit changes in Xapian? \
We are indexing the file and making changes over there, is not more sensible to just \
club them together? Unless, there are provable speed and io improvments by storing \
them in sqlite.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">Regarding the Scheduler Test: Each \
change should ideally be tested independently so that we know what parts are actually \
improving the system and by how much. Also, the scheduler test isn't ideal since it \
contains artificial delays in the queues. We have them because indexing everything \
too fast is a problem.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">Regarding Test Crashing - The previous \
FileIndexingQueue test already had code for that.</p> <p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Test \
coverage. At minimum, the following tests would be required - 1. Extractor Client - \
Handling crashing and timeouts. Maybe tests for the FileIQ would also be required \
since it is now updating the 2. Benchmarks for the FileIndexingQueue (both io and \
cpu) 3. Benchmarks for the Scheduler (both io and cpu)
4. The BasicIndexingJob test should still be there - I understand that you've moved \
some part of the code to the BasicIQ, please add relevant tests.</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">I would be very interested in seeing both the IO and CPU benchmarks. More \
in the IO ones since the speed benefits while nice to have, do not amount to much if \
we have artifical delays.</p></pre> <br />







<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On September 30th, 2014, 4:25 p.m. UTC, <b>Vishesh \
Handa</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  


<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; \
border-collapse: collapse; margin: 2px padding: 2px;">  <thead>
  <tr>
   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; \
font-size: 9pt; padding: 4px 8px; text-align: left;">  <a \
href="https://git.reviewboard.kde.org/r/120438/diff/1/?file=315778#file315778line1" \
style="color: black; font-weight: bold; text-decoration: underline;">.gitignore</a>  \
<span style="font-weight: normal;">

     (Diff revision 1)

    </span>
   </th>
  </tr>
 </thead>



 
 

 <tbody>

  <tr>
    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" \
align="right"><font size="2"></font></th>  <td bgcolor="#c5ffc4" width="50%"><pre \
style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>  <th \
bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid \
#C0C0C0;" align="right"><font size="2">1</font></th>  <td bgcolor="#c5ffc4" \
width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; \
">*.swp</pre></td>  </tr>

 </tbody>

</table>

  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Is \
this required?</p></pre>  </blockquote>



 <p>On October 13th, 2014, 8:27 p.m. UTC, <b>Aaron J. Seigo</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">no; \
doesn't have to be there. happy to leave it out.</p></pre>  </blockquote>







</blockquote>
<pre style="margin-left: 1em; white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">Please do. These kind of things belong in a global gitignore \
file.</p></pre> <br />

<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On September 30th, 2014, 4:25 p.m. UTC, <b>Vishesh \
Handa</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  


<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; \
border-collapse: collapse; margin: 2px padding: 2px;">  <thead>
  <tr>
   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; \
font-size: 9pt; padding: 4px 8px; text-align: left;">  <a \
href="https://git.reviewboard.kde.org/r/120438/diff/1/?file=315808#file315808line30" \
style="color: black; font-weight: bold; text-decoration: \
underline;">src/file/lib/extractorclient.h</a>  <span style="font-weight: normal;">

     (Diff revision 1)

    </span>
   </th>
  </tr>
 </thead>



 
 

 <tbody>

  <tr>
    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" \
align="right"><font size="2"></font></th>  <td bgcolor="#c5ffc4" width="50%"><pre \
style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>  <th \
bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid \
#C0C0C0;" align="right"><font size="2">30</font></th>  <td bgcolor="#c5ffc4" \
width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "><span \
class="n">class</span> <span class="n">BALOO_FILE_EXPORT</span> <span \
class="n">ExtractorClient</span> <span class="o">:</span> <span \
class="n">public</span> <span class="n">QObject</span></pre></td>  </tr>

 </tbody>

</table>

  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">This \
add new public API.</p></pre>  </blockquote>



 <p>On October 13th, 2014, 8:27 p.m. UTC, <b>Aaron J. Seigo</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Which \
makes sense if it is used by the Baloo widgets repository, unless they can share code \
privately. Otherwise, yes, it's fine to make this private if nothing else is going to \
use it.</p></pre>  </blockquote>







</blockquote>
<pre style="margin-left: 1em; white-space: pre-wrap; white-space: -moz-pre-wrap; \
white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">Baloo Widgets no longer uses the extractor. This can be made \
private.</p></pre> <br />




<p>- Vishesh</p>


<br />
<p>On September 30th, 2014, 12:47 p.m. UTC, Aaron J. Seigo wrote:</p>









<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="12" style="border: \
1px #888a85 solid; border-radius: 6px; -moz-border-radius: 6px; \
-webkit-border-radius: 6px;">  <tr>
  <td>

<div>Review request for Baloo.</div>
<div>By Aaron J. Seigo.</div>


<p style="color: grey;"><i>Updated Sept. 30, 2014, 12:47 p.m.</i></p>









<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
baloo
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Current Baloo's file indexer takes batches of files by \
its command line and processes them async. There are several iterations over and \
creation of various collection classes. This patch set does away with all of \
this.</p> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Instead, the worker process now is directed over stdin \
and responds over stdout (old school UNIX!). This is done using a very simple \
protocol which is fully documented in docs/ and is encapsulated in a convenience \
class so as to make future maintenance relatively painless in the advent that the \
protocol is adjusted or an entirely new system is put inplace. IOW, the next me \
should have an easier time improving on what is there. This allows the worker process \
to be long(er)-lived and moves more of the relevant bookkeeping to the same place \
(the worker process). The resulting code is about the same size but cleaner and much \
more performant.</p> <p style="padding: 0;text-rendering: inherit;margin: \
0;line-height: inherit;white-space: inherit;">The tracking of file progress is also \
moved to the (less expensive, easier to update incrementally) sqlite db where file \
data is already kept. It also versions the sqlite schema so that runtime upgrades can \
be made (as required here) both now and in the future without user intervention.</p> \
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Finally, a tiny (93 LOC, including header) executable \
that wraps the protocol encapsulation class is provided for 100% backwards \
compatibility of <em style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;">all</em> command line options that currently exist.</p> \
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">This makes it a 100% drop-in replacement.</p> <p \
style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
inherit;">Performance of the affected unit tests is ~2.5x better on my laptop: 7.6 v \
3.0 seconds.</p> <p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Known weaknesses:</p> <ul style="padding: \
0;text-rendering: inherit;margin: 0 0 0 1em;line-height: inherit;white-space: \
normal;"> <li style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;">still isn't hardened against multiple processes \
indexing simultaneously. the added sqlite columns are a step in this direction, \
howeer</li> <li style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;">the DBus signal that was being emitted was removed and \
has not been replaced with anything yet. That is because the affected class is in \
baloo-widgets (which happens to be the only user of the relevant <em style="padding: \
0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: \
normal;">public</em> class in the baloo file library) and I have not yet worked on \
that, pending this patch set being approved in principle. I am actually of the \
suspicion that keeping that data synchronized at all times is not needed .. but if it \
is, there are much better ways than DBus. Note that the DBus overhead was applied to \
<em style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: normal;">every single file indexed</em> even if Dolphin (the \
only user of this feature) wasn't running or had the metadata frame open.</li> </ul>
<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">The base idea behind this patch is to make file \
indexing as blazing fast as possible, which means using as few resources, CPU &amp; \
memory, as possible. That ought to be the #1 priority for Baloo when it comes to \
indexing files, both to keep it away from user's notice on desktop class hardware as \
well as extend its relevance to smaller devices. The code in this patch is at least \
as clear (imho more so, actually) than what it replaces and was written for \
maintainability.</p></pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: \
inherit;white-space: inherit;">Indexed files, ran testsuite with 0 \
failures.</p></pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>src/file/extractor/app.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/file/extractor/baloo_file_extractor.h <span style="color: \
grey">(a8cff51ae763c36fb2934c80c04cbf0f710a01c7)</span></li>

 <li>src/file/extractor/baloo_file_extractor.cpp <span style="color: \
grey">(b2e7f97f2f0537807c314e2aca008d67711a69ce)</span></li>

 <li>src/file/extractor/extractorworker.cpp <span style="color: \
grey">(e710908dc8b02b7edc9f40e44b9d5d5883986846)</span></li>

 <li>src/file/extractor/main.cpp <span style="color: \
grey">(606a1eb9d42b7079fee497a3fdffbe2327512c13)</span></li>

 <li>src/file/fileindexingjob.h <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/file/fileindexingjob.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/file/fileindexingqueue.h <span style="color: \
grey">(5b1c612feb84046acf36b3dec3bf890da2cbf650)</span></li>

 <li>src/file/fileindexingqueue.cpp <span style="color: \
grey">(c6479cd25308bee9bb6fe6e6791391e6ca83bddf)</span></li>

 <li>src/file/indexingqueue.cpp <span style="color: \
grey">(c8ae4d9d1436ebe222a4758bd5ce551e091ec714)</span></li>

 <li>src/file/lib/CMakeLists.txt <span style="color: \
grey">(fb158a5f00f07ab48de963bf72924c83fc1c6673)</span></li>

 <li>src/file/lib/extractorclient.h <span style="color: \
grey">(1e2ae09c7bd0068ca185b814c5c4996c0b563856)</span></li>

 <li>src/file/lib/extractorclient.cpp <span style="color: \
grey">(221b90abe1899c27c03cf77e279fb71afce80b4d)</span></li>

 <li>src/file/lib/filemapping.cpp <span style="color: \
grey">(5769fe23576c341917e8f37f7dd2743dd446a6e1)</span></li>

 <li>src/file/main.cpp <span style="color: \
grey">(41855215c7b550c403d5ef523d29f9b73db58cb0)</span></li>

 <li>src/file/metadatamover.cpp <span style="color: \
grey">(5117687c1247275235c3830b9d2234bfff353748)</span></li>

 <li>src/file/util.h <span style="color: \
grey">(a4cc2423fd46cda7146b02c572cc3f4ccd069e3e)</span></li>

 <li>src/file/util.cpp <span style="color: \
grey">(2f7ac6de0dddc7d71c2b36bb4fe88677bfe52336)</span></li>

 <li>src/file/autotest/fileindexingjob/fileindexingjobtest.h <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/file/autotest/fileindexingjob/fileindexingjobtest.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/file/autotest/scheduler/schedulertest.cpp <span style="color: \
grey">(b91a481138382b609068b4737c654ac5182c37ec)</span></li>

 <li>src/file/basicindexingjob.h <span style="color: \
grey">(3d2fb339e38be862332ddca45b4926f8000fa0c1)</span></li>

 <li>src/file/basicindexingjob.cpp <span style="color: \
grey">(8d70fa2b1ca737eaa8d2e717a0eb2dbf430fac84)</span></li>

 <li>src/file/basicindexingqueue.cpp <span style="color: \
grey">(8484c9b6868890cae6c2e19b29ea92d500aac03e)</span></li>

 <li>src/file/commitqueue.cpp <span style="color: \
grey">(b26f73cfdbc390cceef01aa6abd2b5af6f9689b2)</span></li>

 <li>src/file/database.h <span style="color: \
grey">(6548de7dbce003112dad8c58c39c624dd806a261)</span></li>

 <li>src/file/database.cpp <span style="color: \
grey">(5c9ccdc442d230bdc5dc9edbe5bc595537d08a8f)</span></li>

 <li>src/file/extractor/CMakeLists.txt <span style="color: \
grey">(7cf3478f547f7fbcdf8908a3471b50fab09b5220)</span></li>

 <li>src/file/extractor/app.h <span style="color: \
grey">(8059a16e8ec2e2a84d1a3b7eb1dd8cfc2a3136de)</span></li>

 <li>.gitignore <span style="color: \
grey">(1377554ebea6f98a2c748183bc5a96852af12ac2)</span></li>

 <li>docs/file_extractor.md <span style="color: \
grey">(3d830e2313bd35994f9fd1d6b52df422d0556fd8)</span></li>

 <li>src/file/CMakeLists.txt <span style="color: \
grey">(1a9e6bf8f2c70b096fbb2c6c9eb78feec9d06622)</span></li>

 <li>src/file/autotest/CMakeLists.txt <span style="color: \
grey">(ccb8575fa8c07f2b136d259ed63d43a09ad75a12)</span></li>

 <li>src/file/autotest/basicindexingjobtest.h <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/file/autotest/basicindexingjobtest.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/file/autotest/fileindexingjob/CMakeLists.txt <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/file/autotest/fileindexingjob/extractor.cpp <span style="color: \
grey">(PRE-CREATION)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/120438/diff/" style="margin-left: \
3em;">View Diff</a></p>






  </td>
 </tr>
</table>








  </div>
 </body>
</html>


--===============1829551522856473575==--



>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic