'Re: Review Request 117996: Add an append(QByteArray) method to the ExtractionResult'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: Review Request 117996: Add an append(QByteArray) method to the ExtractionResult
From:       "Milian Wolff" <mail () milianw ! de>
Date:       2014-05-05 14:05:31
Message-ID: 20140505140531.19540.19657 () probe ! kde ! org
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/117996/#review57314
-----------------------------------------------------------

What is the speed difference compared to the old QString API, but without the \
word-count there? Afaik, the word-count is the major bottleneck and removing it alone \
should greatly speed up the test.

Having a QByteArray in the API would be fine if you document that the data _must_ be \
UTF8. But a meaningful performance test here must include the later conversion to \
std::string for xapian, imo. I.e. what you want to test is file -> qbytearray -> \
std::string vs. file -> qstring -> std::string.

- Milian Wolff

On May 5, 2014, 2:01 p.m., Vishesh Handa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/117996/
> -----------------------------------------------------------
> 
> (Updated May 5, 2014, 2:01 p.m.)
> 
> 
> Review request for Baloo and Milian Wolff.
> 
> 
> Repository: kfilemetadata
> 
> 
> Description
> -------
> 
> Add an append(QByteArray) method to the ExtractionResult
> 
> This way plugins can choose to return the data in utf8 or as a QString,
> and the clients can either just let the standard QString::fromUtf8
> function do its magic, or implement some special handling if they wish.
> 
> This speeds up the PlainTextExtractor quite a bit (60msec vs 8.3msec)
> 
> Unfortunately, this meant discarding the extraction of WordCount from
> the Plain Text extractor. Though considering the speed difference, I
> think it is worth it.
> 
> 
> Diffs
> -----
> 
> autotests/indexerextractortests.cpp 6b7c605 
> autotests/simpleresult.h f3793b5 
> src/extractionresult.h 76dfe59 
> src/extractionresult.cpp 9bc7946 
> src/extractors/plaintextextractor.cpp 5a38857 
> 
> Diff: https://git.reviewboard.kde.org/r/117996/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vishesh Handa
> 
> 

[Attachment #5 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://git.reviewboard.kde.org/r/117996/">https://git.reviewboard.kde.org/r/117996/</a>
  </td>
    </tr>
   </table>
   <br />

 <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">What is the speed \
difference compared to the old QString API, but without the word-count there? Afaik, \
the word-count is the major bottleneck and removing it alone should greatly speed up \
the test.

Having a QByteArray in the API would be fine if you document that the data _must_ be \
UTF8. But a meaningful performance test here must include the later conversion to \
std::string for xapian, imo. I.e. what you want to test is file -&gt; qbytearray \
-&gt; std::string vs. file -&gt; qstring -&gt; std::string.</pre>  <br />

<p>- Milian Wolff</p>

<br />
<p>On May 5th, 2014, 2:01 p.m. UTC, Vishesh Handa wrote:</p>

<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('https://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for Baloo and Milian Wolff.</div>
<div>By Vishesh Handa.</div>

<p style="color: grey;"><i>Updated May 5, 2014, 2:01 p.m.</i></p>

<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
kfilemetadata
</div>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">    Add an append(QByteArray) method to the ExtractionResult

    This way plugins can choose to return the data in utf8 or as a QString,
    and the clients can either just let the standard QString::fromUtf8
    function do its magic, or implement some special handling if they wish.

    This speeds up the PlainTextExtractor quite a bit (60msec vs 8.3msec)

    Unfortunately, this meant discarding the extraction of WordCount from
    the Plain Text extractor. Though considering the speed difference, I
    think it is worth it.
</pre>
  </td>
 </tr>
</table>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>autotests/indexerextractortests.cpp <span style="color: \
grey">(6b7c605)</span></li>

 <li>autotests/simpleresult.h <span style="color: grey">(f3793b5)</span></li>

 <li>src/extractionresult.h <span style="color: grey">(76dfe59)</span></li>

 <li>src/extractionresult.cpp <span style="color: grey">(9bc7946)</span></li>

 <li>src/extractors/plaintextextractor.cpp <span style="color: \
grey">(5a38857)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/117996/diff/" style="margin-left: \
3em;">View Diff</a></p>

  </td>
 </tr>
</table>

  </div>
 </body>
</html>

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

[prev in list] [next in list] [prev in thread] [next in thread]