[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kimageshop
Subject:    Re: About vectorization and planar channels in Krita
From:       Sven Langkamp <sven.langkamp () gmail ! com>
Date:       2012-09-30 22:05:42
Message-ID: CAAmsBf=RvCaqeT79NOZ8T1zAE2bYHZgH8S9jdfNW3XXW5cS=AQ () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov <dimula73@gmail.com> wrote:

> Hi!
>
> After that long discussion about grayscale selections I decided to check
> whether we really need planar channels for implementing the vectorization
> in Krita. And it turned out, that we need *not* do it. The SIMD
> instructions cannot work with bytes directly (we won't be able to multiply
> anything), so in both of the cases, when we use planar bytes and not, we
> will have to convert the pixel data into some other format: single
> precision float or single word integer, doing some inevitable permutations
> and wasting time on them. The flat channels will give us no help with it.
>

Really interesting solution. My idea was to shuffle the alpha (that would
require less converts, but more other instructions) from the loaded pixel
but this looks better. Unfortunately I don't have a cpu that has avx, so I
can't test it. Would be interesting how this performs with SSE and integers
instead of floats.



> What we really need to do is just to use the advantages of RGBA pixel
> layout (better data locality and good alignment) and optimize our code. As
> a proof of concept, I've written a small benchmark, that compares our
> standard integer COMPOSITE_OVER algorithm against its SIMD (avx)
> implementation. The streamed implementation showed a 3.3 times better speed
> than the algorithm we use right now. More than that, this sketch was
> written in just a day so it has lots of possibilities for optimization (it
> can be modified to process 10.6 pixels at a time instead of 8, for example).
>
> The actual results of composing of 32 MPixels:
>
> TestAvxCompositeOverTest::testPerPixelComposition(): 370 msecs
> TestAvxCompositeOverTest::testAVXComposition():      147 msecs
> TestAvxCompositeOverTest::testAVXCompositionx2():    113 msecs
>
> What I want to tell with this mail:
> 1) There is no need to port the whole Krita to use some other channel
> layouts. Even current layout gives us lots of possibilities to optimize our
> code.
>

Maybe it would be a good idea to give some time on the action plan to this.


> 2) We still need to decide what to do with grayscale selections.
>

My favorite is still the composite op solution.

[Attachment #5 (text/html)]

<div class="gmail_quote">On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov <span \
dir="ltr">&lt;<a href="mailto:dimula73@gmail.com" \
target="_blank">dimula73@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">

Hi!<br><br>After that long discussion about grayscale selections I decided to check \
whether we really need planar channels for implementing the vectorization in Krita. \
And it turned out, that we need *not* do it. The SIMD instructions cannot work with \
bytes directly (we won&#39;t be able to multiply anything), so in both of the cases, \
when we use planar bytes and not, we will have to convert the pixel data into some \
other format: single precision float or single word integer, doing some inevitable \
permutations and wasting time on them. The flat channels will give us no help with \
it.<br>

</blockquote><div><br></div><div>Really interesting solution. My idea was to shuffle \
the alpha (that would require less converts, but more other instructions) from the \
loaded pixel but this looks better. Unfortunately I don&#39;t have a cpu that has \
avx, so I can&#39;t test it. Would be interesting how this performs with SSE and \
integers instead of floats.</div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">What we really need to do is just \
to use the advantages of RGBA pixel layout (better data locality and good alignment) \
and optimize our code. As a proof of concept, I&#39;ve written a small benchmark, \
that compares our standard integer COMPOSITE_OVER algorithm against its SIMD (avx) \
implementation. The streamed implementation showed a 3.3 times better speed than the \
algorithm we use right now. More than that, this sketch was written in just a day so \
it has lots of possibilities for optimization (it can be modified to process 10.6 \
pixels at a time instead of 8, for example).<br>


<br>The actual results of composing of 32 MPixels:<br><br><font><span \
style="font-family:courier \
new,monospace">TestAvxCompositeOverTest::testPerPixelComposition():  370 \
msecs</span><br style="font-family:courier new,monospace">


<span style="font-family:courier \
new,monospace">TestAvxCompositeOverTest::testAVXComposition():      147 \
msecs</span><br style="font-family:courier new,monospace"><span \
style="font-family:courier \
new,monospace">TestAvxCompositeOverTest::testAVXCompositionx2():    113 \
msecs</span></font><br style="font-family:courier new,monospace">


<br>What I want to tell with this mail:<br>1) There is no need to port the whole \
Krita to use some other channel layouts. Even current layout gives us lots of \
possibilities to optimize our code.<br></blockquote><div><br>

</div><div>Maybe it would be a good idea to give some time on the action plan to \
this. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">2) We still need to decide what to \
do with grayscale selections.<br> </blockquote><div><br></div><div>My favorite is \
still the composite op solution. </div></div><br>



_______________________________________________
kimageshop mailing list
kimageshop@kde.org
https://mail.kde.org/mailman/listinfo/kimageshop


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic