[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-kimageshop
Subject: Re: About vectorization and planar channels in Krita
From: Sven Langkamp <sven.langkamp () gmail ! com>
Date: 2012-09-30 22:05:42
Message-ID: CAAmsBf=RvCaqeT79NOZ8T1zAE2bYHZgH8S9jdfNW3XXW5cS=AQ () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov <dimula73@gmail.com> wrote:
> Hi!
>
> After that long discussion about grayscale selections I decided to check
> whether we really need planar channels for implementing the vectorization
> in Krita. And it turned out, that we need *not* do it. The SIMD
> instructions cannot work with bytes directly (we won't be able to multiply
> anything), so in both of the cases, when we use planar bytes and not, we
> will have to convert the pixel data into some other format: single
> precision float or single word integer, doing some inevitable permutations
> and wasting time on them. The flat channels will give us no help with it.
>
Really interesting solution. My idea was to shuffle the alpha (that would
require less converts, but more other instructions) from the loaded pixel
but this looks better. Unfortunately I don't have a cpu that has avx, so I
can't test it. Would be interesting how this performs with SSE and integers
instead of floats.
> What we really need to do is just to use the advantages of RGBA pixel
> layout (better data locality and good alignment) and optimize our code. As
> a proof of concept, I've written a small benchmark, that compares our
> standard integer COMPOSITE_OVER algorithm against its SIMD (avx)
> implementation. The streamed implementation showed a 3.3 times better speed
> than the algorithm we use right now. More than that, this sketch was
> written in just a day so it has lots of possibilities for optimization (it
> can be modified to process 10.6 pixels at a time instead of 8, for example).
>
> The actual results of composing of 32 MPixels:
>
> TestAvxCompositeOverTest::testPerPixelComposition(): 370 msecs
> TestAvxCompositeOverTest::testAVXComposition(): 147 msecs
> TestAvxCompositeOverTest::testAVXCompositionx2(): 113 msecs
>
> What I want to tell with this mail:
> 1) There is no need to port the whole Krita to use some other channel
> layouts. Even current layout gives us lots of possibilities to optimize our
> code.
>
Maybe it would be a good idea to give some time on the action plan to this.
> 2) We still need to decide what to do with grayscale selections.
>
My favorite is still the composite op solution.
[Attachment #5 (text/html)]
<div class="gmail_quote">On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov <span \
dir="ltr"><<a href="mailto:dimula73@gmail.com" \
target="_blank">dimula73@gmail.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">
Hi!<br><br>After that long discussion about grayscale selections I decided to check \
whether we really need planar channels for implementing the vectorization in Krita. \
And it turned out, that we need *not* do it. The SIMD instructions cannot work with \
bytes directly (we won't be able to multiply anything), so in both of the cases, \
when we use planar bytes and not, we will have to convert the pixel data into some \
other format: single precision float or single word integer, doing some inevitable \
permutations and wasting time on them. The flat channels will give us no help with \
it.<br>
</blockquote><div><br></div><div>Really interesting solution. My idea was to shuffle \
the alpha (that would require less converts, but more other instructions) from the \
loaded pixel but this looks better. Unfortunately I don't have a cpu that has \
avx, so I can't test it. Would be interesting how this performs with SSE and \
integers instead of floats.</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">What we really need to do is just \
to use the advantages of RGBA pixel layout (better data locality and good alignment) \
and optimize our code. As a proof of concept, I've written a small benchmark, \
that compares our standard integer COMPOSITE_OVER algorithm against its SIMD (avx) \
implementation. The streamed implementation showed a 3.3 times better speed than the \
algorithm we use right now. More than that, this sketch was written in just a day so \
it has lots of possibilities for optimization (it can be modified to process 10.6 \
pixels at a time instead of 8, for example).<br>
<br>The actual results of composing of 32 MPixels:<br><br><font><span \
style="font-family:courier \
new,monospace">TestAvxCompositeOverTest::testPerPixelComposition(): 370 \
msecs</span><br style="font-family:courier new,monospace">
<span style="font-family:courier \
new,monospace">TestAvxCompositeOverTest::testAVXComposition(): 147 \
msecs</span><br style="font-family:courier new,monospace"><span \
style="font-family:courier \
new,monospace">TestAvxCompositeOverTest::testAVXCompositionx2(): 113 \
msecs</span></font><br style="font-family:courier new,monospace">
<br>What I want to tell with this mail:<br>1) There is no need to port the whole \
Krita to use some other channel layouts. Even current layout gives us lots of \
possibilities to optimize our code.<br></blockquote><div><br>
</div><div>Maybe it would be a good idea to give some time on the action plan to \
this. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">2) We still need to decide what to \
do with grayscale selections.<br> </blockquote><div><br></div><div>My favorite is \
still the composite op solution. </div></div><br>
_______________________________________________
kimageshop mailing list
kimageshop@kde.org
https://mail.kde.org/mailman/listinfo/kimageshop
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic