[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-kimageshop
Subject: Re: Vc branch ready for testing
From: Sven Langkamp <sven.langkamp () gmail ! com>
Date: 2012-09-11 2:11:59
Message-ID: CAAmsBfkLoGFFMrA2sOtP49rqbTd4UcMdGuU_MsUM16ygoPu7tQ () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
On Mon, Sep 10, 2012 at 4:35 PM, Boudewijn Rempt <boud@valdyas.org> wrote:
> On Monday 10 September 2012 Sep, Sven Langkamp wrote:
> >
> > Branch has been tested on half a dozen systems now. Results were from
> twice
> > as fast to very slight improvement/no change noticeable. Not sure why
> there
> > is such a difference between systems. Dual-core systems seem to have a
> > bigger improvement. Might be that it was mask processing was already
> quite
> > fast on quad-core cpus before.
>
> Weirdly, though, I did see a big change on my desktop machine.
>
> > Branch is almost feature complete, just some improvements for detecting
> > cmake files needed. Also will need some ifdefs if vc should stay an
> > optional dependency.
>
> I think it should, at least until it gets more widespread and until I've
> fixed the Windows port :-)
>
> > I did some further profiling with callgrind on some 1000px 0.04 spacing.
> > Callgrind file can be found here: http://depot.tu-dortmund.de/get/ybukq
> >
> > It shows that the composite op is now the most expensive operation in the
> > KisStrokeBenchmark. Which is probably also the reason that we don't see
> > bigger improvements from the mask processing. Pentalis wants to look at
> the
> > composite ops and see what can be done there.
>
> It should be a prime candidate for vectorization -- but it might mean a
> big operation since I'm beginning to suspect it'd mean taking the
> alpha-channel out of band.
>
> > I'm considering to
> > parallelize the fixedBlt with QtConcurrent like we already have for the
> > brush mask.
>
> That should work fine as well.
I have done a quick experiment to test that in
branch krita-multithreadedfixedbitblt-langkamp. Stroke benchmark is
slightly faster and I measured that the time fixedBitBlt went down (haven't
done detailed testing, but is looks like a speedup of 1.6). I didn't notice
any improvements while painting though. Would be interesting to see if it
give bigger improvements on a quad-core (no extra libs required).
I'm more and more wondering where all the performance goes.
[Attachment #5 (text/html)]
<div class="gmail_quote">On Mon, Sep 10, 2012 at 4:35 PM, Boudewijn Rempt <span \
dir="ltr"><<a href="mailto:boud@valdyas.org" \
target="_blank">boud@valdyas.org</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> <div class="im">On Monday 10 September 2012 Sep, Sven \
Langkamp wrote:<br> ><br>
> Branch has been tested on half a dozen systems now. Results were from twice<br>
> as fast to very slight improvement/no change noticeable. Not sure why there<br>
> is such a difference between systems. Dual-core systems seem to have a<br>
> bigger improvement. Might be that it was mask processing was already quite<br>
> fast on quad-core cpus before.<br>
<br>
</div>Weirdly, though, I did see a big change on my desktop machine.<br>
<div class="im"><br>
> Branch is almost feature complete, just some improvements for detecting<br>
> cmake files needed. Also will need some ifdefs if vc should stay an<br>
> optional dependency.<br>
<br>
</div>I think it should, at least until it gets more widespread and until I've \
fixed the Windows port :-)<br> <div class="im"><br>
> I did some further profiling with callgrind on some 1000px 0.04 spacing.<br>
> Callgrind file can be found here: <a \
href="http://depot.tu-dortmund.de/get/ybukq" \
target="_blank">http://depot.tu-dortmund.de/get/ybukq</a><br> ><br>
> It shows that the composite op is now the most expensive operation in the<br>
> KisStrokeBenchmark. Which is probably also the reason that we don't see<br>
> bigger improvements from the mask processing. Pentalis wants to look at the<br>
> composite ops and see what can be done there.<br>
<br>
</div>It should be a prime candidate for vectorization -- but it might mean a big \
operation since I'm beginning to suspect it'd mean taking the alpha-channel \
out of band.<br> <div class="im"><br>
> I'm considering to<br>
> parallelize the fixedBlt with QtConcurrent like we already have for the<br>
> brush mask.<br>
<br>
</div>That should work fine as well.</blockquote><div><br></div><div>I have done a \
quick experiment to test that in branch krita-multithreadedfixedbitblt-langkamp. \
Stroke benchmark is slightly faster and I measured that the time fixedBitBlt went \
down (haven't done detailed testing, but is looks like a speedup of 1.6). I \
didn't notice any improvements while painting though. Would be interesting to see \
if it give bigger improvements on a quad-core (no extra libs required).</div> \
<div><br></div><div>I'm more and more wondering where all the performance \
goes.</div></div>
_______________________________________________
kimageshop mailing list
kimageshop@kde.org
https://mail.kde.org/mailman/listinfo/kimageshop
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic