--===============3983799804534506071== Content-Type: multipart/alternative; boundary=20cf3030c2af778c6004caf27d8e --20cf3030c2af778c6004caf27d8e Content-Type: text/plain; charset=ISO-8859-1 On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov wrote: > Hi! > > After that long discussion about grayscale selections I decided to check > whether we really need planar channels for implementing the vectorization > in Krita. And it turned out, that we need *not* do it. The SIMD > instructions cannot work with bytes directly (we won't be able to multiply > anything), so in both of the cases, when we use planar bytes and not, we > will have to convert the pixel data into some other format: single > precision float or single word integer, doing some inevitable permutations > and wasting time on them. The flat channels will give us no help with it. > Really interesting solution. My idea was to shuffle the alpha (that would require less converts, but more other instructions) from the loaded pixel but this looks better. Unfortunately I don't have a cpu that has avx, so I can't test it. Would be interesting how this performs with SSE and integers instead of floats. > What we really need to do is just to use the advantages of RGBA pixel > layout (better data locality and good alignment) and optimize our code. As > a proof of concept, I've written a small benchmark, that compares our > standard integer COMPOSITE_OVER algorithm against its SIMD (avx) > implementation. The streamed implementation showed a 3.3 times better speed > than the algorithm we use right now. More than that, this sketch was > written in just a day so it has lots of possibilities for optimization (it > can be modified to process 10.6 pixels at a time instead of 8, for example). > > The actual results of composing of 32 MPixels: > > TestAvxCompositeOverTest::testPerPixelComposition(): 370 msecs > TestAvxCompositeOverTest::testAVXComposition(): 147 msecs > TestAvxCompositeOverTest::testAVXCompositionx2(): 113 msecs > > What I want to tell with this mail: > 1) There is no need to port the whole Krita to use some other channel > layouts. Even current layout gives us lots of possibilities to optimize our > code. > Maybe it would be a good idea to give some time on the action plan to this. > 2) We still need to decide what to do with grayscale selections. > My favorite is still the composite op solution. --20cf3030c2af778c6004caf27d8e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov = <dimula73@gmail.com> wrote:

Hi!

After that long discussion about grayscale selections I decided = to check whether we really need planar channels for implementing the vector= ization in Krita. And it turned out, that we need *not* do it. The SIMD ins= tructions cannot work with bytes directly (we won't be able to multiply= anything), so in both of the cases, when we use planar bytes and not, we w= ill have to convert the pixel data into some other format: single precision= float or single word integer, doing some inevitable permutations and wasti= ng time on them. The flat channels will give us no help with it.

Really interesting solution. My idea was t= o shuffle the alpha (that would require less converts, but more other instr= uctions) from the loaded pixel but this looks better. Unfortunately I don&#= 39;t have a cpu that has avx, so I can't test it. Would be interesting = how this performs with SSE and integers instead of floats.

=A0

What we really = need to do is just to use the advantages of RGBA pixel layout (better data = locality and good alignment) and optimize our code. As a proof of concept, = I've written a small benchmark, that compares our standard integer COMP= OSITE_OVER algorithm against its SIMD (avx) implementation. The streamed im= plementation showed a 3.3 times better speed than the algorithm we use righ= t now. More than that, this sketch was written in just a day so it has lots= of possibilities for optimization (it can be modified to process 10.6 pixe= ls at a time instead of 8, for example).

The actual results of composing of 32 MPixels:

TestAvxCompositeOverTest::testPerPix= elComposition(): 370 msecs
TestAvxCompositeOverTest:= :testAVXComposition():=A0=A0=A0=A0=A0 147 msecs

What I want to tell with this mail:
1) There is no need to port the = whole Krita to use some other channel layouts. Even current layout gives us= lots of possibilities to optimize our code.

Maybe it would be a good idea to give some time on the action pl= an to this.=A0

=A0

2) We st= ill need to decide what to do with grayscale selections.

My favorite is still the composite op solu= tion.=A0

--20cf3030c2af778c6004caf27d8e-- --===============3983799804534506071== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ kimageshop mailing list kimageshop@kde.org https://mail.kde.org/mailman/listinfo/kimageshop --===============3983799804534506071==--