[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-devel
Subject: Re: Review Request 117996: Add an append(QByteArray) method to the ExtractionResult
From: "Milian Wolff" <mail () milianw ! de>
Date: 2014-05-05 16:06:11
Message-ID: 20140505160611.19540.76574 () probe ! kde ! org
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
> On May 5, 2014, 2:05 p.m., Milian Wolff wrote:
> > What is the speed difference compared to the old QString API, but without the \
> > word-count there? Afaik, the word-count is the major bottleneck and removing it \
> > alone should greatly speed up the test.
> > Having a QByteArray in the API would be fine if you document that the data _must_ \
> > be UTF8. But a meaningful performance test here must include the later conversion \
> > to std::string for xapian, imo. I.e. what you want to test is file -> qbytearray \
> > -> std::string vs. file -> qstring -> std::string.
>
> Vishesh Handa wrote:
> Original: 60 msecs
> Without Word Count: 30 msecs
> Without Word Count + ByteArray: 8 msecs
>
> Milian Wolff wrote:
> cool, looks promising. And how slow would be your patch right now, just with \
> result->append(QString::fromUtf8(arr)); Or is that then the 30msecs? Just wondering \
> what the impact of using STL instead of QIODevice/QFile is here.
> Vishesh Handa wrote:
> With QString::fromUtf8 - About 9msecs. So, it's all QFile.
>
> I've tried testing out actually indexing a file before and after this patch (and a \
> small patch in Baloo, to directly convert it into a QString). I can barely make out \
> any difference. Just about 100-200 msecs on a file which takes 11 seconds.
> It might just make sense to discard the whole append(QByteArray) and just ship the \
> QFile parts.
yes that sounds like a good approach.
- Milian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/117996/#review57314
-----------------------------------------------------------
On May 5, 2014, 2:01 p.m., Vishesh Handa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/117996/
> -----------------------------------------------------------
>
> (Updated May 5, 2014, 2:01 p.m.)
>
>
> Review request for Baloo and Milian Wolff.
>
>
> Repository: kfilemetadata
>
>
> Description
> -------
>
> Add an append(QByteArray) method to the ExtractionResult
>
> This way plugins can choose to return the data in utf8 or as a QString,
> and the clients can either just let the standard QString::fromUtf8
> function do its magic, or implement some special handling if they wish.
>
> This speeds up the PlainTextExtractor quite a bit (60msec vs 8.3msec)
>
> Unfortunately, this meant discarding the extraction of WordCount from
> the Plain Text extractor. Though considering the speed difference, I
> think it is worth it.
>
>
> Diffs
> -----
>
> autotests/indexerextractortests.cpp 6b7c605
> autotests/simpleresult.h f3793b5
> src/extractionresult.h 76dfe59
> src/extractionresult.cpp 9bc7946
> src/extractors/plaintextextractor.cpp 5a38857
>
> Diff: https://git.reviewboard.kde.org/r/117996/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Vishesh Handa
>
>
[Attachment #5 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://git.reviewboard.kde.org/r/117996/">https://git.reviewboard.kde.org/r/117996/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <p style="margin-top: 0;">On May 5th, 2014, 2:05 p.m. UTC, <b>Milian \
Wolff</b> wrote:</p> <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;"> <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">What is the speed difference compared to the old QString API, but \
without the word-count there? Afaik, the word-count is the major bottleneck and \
removing it alone should greatly speed up the test.
Having a QByteArray in the API would be fine if you document that the data _must_ be \
UTF8. But a meaningful performance test here must include the later conversion to \
std::string for xapian, imo. I.e. what you want to test is file -> qbytearray \
-> std::string vs. file -> qstring -> std::string.</pre> </blockquote>
<p>On May 5th, 2014, 2:35 p.m. UTC, <b>Vishesh Handa</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Original: 60 msecs \
Without Word Count: 30 msecs Without Word Count + ByteArray: 8 msecs</pre>
</blockquote>
<p>On May 5th, 2014, 3:21 p.m. UTC, <b>Milian Wolff</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">cool, looks promising. \
And how slow would be your patch right now, just with \
result->append(QString::fromUtf8(arr)); Or is that then the 30msecs? Just \
wondering what the impact of using STL instead of QIODevice/QFile is here.</pre> \
</blockquote>
<p>On May 5th, 2014, 3:31 p.m. UTC, <b>Vishesh Handa</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">With QString::fromUtf8 - \
About 9msecs. So, it's all QFile.
I've tried testing out actually indexing a file before and after this patch (and \
a small patch in Baloo, to directly convert it into a QString). I can barely make out \
any difference. Just about 100-200 msecs on a file which takes 11 seconds.
It might just make sense to discard the whole append(QByteArray) and just ship the \
QFile parts.</pre> </blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">yes that sounds like a \
good approach.</pre> <br />
<p>- Milian</p>
<br />
<p>On May 5th, 2014, 2:01 p.m. UTC, Vishesh Handa wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('https://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for Baloo and Milian Wolff.</div>
<div>By Vishesh Handa.</div>
<p style="color: grey;"><i>Updated May 5, 2014, 2:01 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
kfilemetadata
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;"> Add an append(QByteArray) method to the ExtractionResult
This way plugins can choose to return the data in utf8 or as a QString,
and the clients can either just let the standard QString::fromUtf8
function do its magic, or implement some special handling if they wish.
This speeds up the PlainTextExtractor quite a bit (60msec vs 8.3msec)
Unfortunately, this meant discarding the extraction of WordCount from
the Plain Text extractor. Though considering the speed difference, I
think it is worth it.
</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>autotests/indexerextractortests.cpp <span style="color: \
grey">(6b7c605)</span></li>
<li>autotests/simpleresult.h <span style="color: grey">(f3793b5)</span></li>
<li>src/extractionresult.h <span style="color: grey">(76dfe59)</span></li>
<li>src/extractionresult.cpp <span style="color: grey">(9bc7946)</span></li>
<li>src/extractors/plaintextextractor.cpp <span style="color: \
grey">(5a38857)</span></li>
</ul>
<p><a href="https://git.reviewboard.kde.org/r/117996/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic