'Re: Vc branch ready for testing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kimageshop
Subject:    Re: Vc branch ready for testing
From:       Sven Langkamp <sven.langkamp () gmail ! com>
Date:       2012-09-11 2:11:59
Message-ID: CAAmsBfkLoGFFMrA2sOtP49rqbTd4UcMdGuU_MsUM16ygoPu7tQ () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

On Mon, Sep 10, 2012 at 4:35 PM, Boudewijn Rempt <boud@valdyas.org> wrote:

> On Monday 10 September 2012 Sep, Sven Langkamp wrote:
> >
> > Branch has been tested on half a dozen systems now. Results were from
> twice
> > as fast to very slight improvement/no change noticeable. Not sure why
> there
> > is such a difference between systems. Dual-core systems seem to have a
> > bigger improvement. Might be that it was mask processing was already
> quite
> > fast on quad-core cpus before.
>
> Weirdly, though, I did see a big change on my desktop machine.
>
> > Branch is almost feature complete, just some improvements for detecting
> > cmake files needed. Also will need some ifdefs if vc should stay an
> > optional dependency.
>
> I think it should, at least until it gets more widespread and until I've
> fixed the Windows port :-)
>
> > I did some further profiling with callgrind on some 1000px 0.04 spacing.
> > Callgrind file can be found here: http://depot.tu-dortmund.de/get/ybukq
> >
> > It shows that the composite op is now the most expensive operation in the
> > KisStrokeBenchmark. Which is probably also the reason that we don't see
> > bigger improvements from the mask processing. Pentalis wants to look at
> the
> > composite ops and see what can be done there.
>
> It should be a prime candidate for vectorization -- but it might mean a
> big operation since I'm beginning to suspect it'd mean taking the
> alpha-channel out of band.
>
> > I'm considering to
> > parallelize the fixedBlt with QtConcurrent like we already have for the
> > brush mask.
>
> That should work fine as well.

I have done a quick experiment to test that in
branch krita-multithreadedfixedbitblt-langkamp. Stroke benchmark is
slightly faster and I measured that the time fixedBitBlt went down (haven't
done detailed testing, but is looks like a speedup of 1.6). I didn't notice
any improvements while painting though. Would be interesting to see if it
give bigger improvements on a quad-core (no extra libs required).

I'm more and more wondering where all the performance goes.

[Attachment #5 (text/html)]

<div class="gmail_quote">On Mon, Sep 10, 2012 at 4:35 PM, Boudewijn Rempt <span \
dir="ltr">&lt;<a href="mailto:boud@valdyas.org" \
target="_blank">boud@valdyas.org</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> <div class="im">On Monday 10 September 2012 Sep, Sven \
Langkamp wrote:<br> &gt;<br>
&gt; Branch has been tested on half a dozen systems now. Results were from twice<br>
&gt; as fast to very slight improvement/no change noticeable. Not sure why there<br>
&gt; is such a difference between systems. Dual-core systems seem to have a<br>
&gt; bigger improvement. Might be that it was mask processing was already quite<br>
&gt; fast on quad-core cpus before.<br>
<br>
</div>Weirdly, though, I did see a big change on my desktop machine.<br>
<div class="im"><br>
&gt; Branch is almost feature complete, just some improvements for detecting<br>
&gt; cmake files needed. Also will need some ifdefs if vc should stay an<br>
&gt; optional dependency.<br>
<br>
</div>I think it should, at least until it gets more widespread and until I&#39;ve \
fixed the Windows port :-)<br> <div class="im"><br>
&gt; I did some further profiling with callgrind on some 1000px 0.04 spacing.<br>
&gt; Callgrind file can be found here: <a \
href="http://depot.tu-dortmund.de/get/ybukq" \
target="_blank">http://depot.tu-dortmund.de/get/ybukq</a><br> &gt;<br>
&gt; It shows that the composite op is now the most expensive operation in the<br>
&gt; KisStrokeBenchmark. Which is probably also the reason that we don&#39;t see<br>
&gt; bigger improvements from the mask processing. Pentalis wants to look at the<br>
&gt; composite ops and see what can be done there.<br>
<br>
</div>It should be a prime candidate for vectorization -- but it might mean a big \
operation since I&#39;m beginning to suspect it&#39;d mean taking the alpha-channel \
out of band.<br> <div class="im"><br>
&gt; I&#39;m considering to<br>
&gt; parallelize the fixedBlt with QtConcurrent like we already have for the<br>
&gt; brush mask.<br>
<br>
</div>That should work fine as well.</blockquote><div><br></div><div>I have done a \
quick experiment to test that in branch krita-multithreadedfixedbitblt-langkamp. \
Stroke benchmark is slightly faster and I measured that the time fixedBitBlt went \
down (haven&#39;t done detailed testing, but is looks like a speedup of 1.6). I \
didn&#39;t notice any improvements while painting though. Would be interesting to see \
if it give bigger improvements on a quad-core (no extra libs required).</div> \
<div><br></div><div>I&#39;m more and more wondering where all the performance \
goes.</div></div>

_______________________________________________
kimageshop mailing list
kimageshop@kde.org
https://mail.kde.org/mailman/listinfo/kimageshop

[prev in list] [next in list] [prev in thread] [next in thread]