[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kimageshop
Subject:    Evaluation of DPC++ compiler for Krita to support GPU computations
From:       Dmitry Kazakov <dimula73 () gmail ! com>
Date:       2023-01-04 13:55:42
Message-ID: CAEkBSfU9JYOMR=hbaGuzGYop92Bgo-byK47edFaJbfvW_9Ov8g () mail ! gmail ! com
[Download RAW message or body]

Hi, all!

The last weeks before the New Year I spent trying to build Intel's DPC++
compiler. As far as I understand, this beast is something like our 'xsimd'
library, but for offloading work to GPUs. Here I would like to share what I
learned about it :)

tldr; my opinion about DPC++ is very positive, though we would have to
invest a lot of time into that; on a positive side, we will be able to
share some code with our XSIMD implementation.

That is what I learned in the process:

1) Intel DPC++ is a flavour of a normal C++ that allows automatic
offloading of the code to the GPU. Basically, you write a normal C++ code,
then pass it to a special 'Queue' class as a lambda or a function pointer
and the rest is done by the compiler automatically. The compiler can either
compile it into intermediate representation (SPIR-V), which will later be
compiled into GPU binary on the user's PC by the GPU driver, or precompile
it directly into your target GPUs' binary code. This approach looks very
nice, because we can reuse our existing composition/brush code written in
C++ inside these GPU routines, which will reduce maintenance burden a lot.

2) There is also a library called oneAPI. It is built on the top of that
DPC++ compiler. We can use it to optimize the Gaussian Blur and other
filters, but I don't think we can use it for brushes and composition.

3) Since DPC++ is an extension of C++, we should use a custom compiler for
that. Basically, we should switch to an unstable branch of Clang spiced
with Intel's patches. It sounds a little bit scary :)

4) As far as I can tell, DPC++ is supported only on Linux and Windows. I'm
not sure we can use it on Android or MacOS.

5) Not only will we have to switch to an unstable branch of Clang, we will
also have to build the compiler ourselves (at least on Windows). Official
builds support only MSVC, but we need a MinGW environment.

6) I have managed to compile and run this compiler with MinGW, but this
process is extremely manual and flanky right now. More work will have to be
done for that. Most probably, we will have to do cross-compilation from
Linux, actually :)

7) The whole idea of DPC++ is really good. We write code in C++ and the
compiler automatically builds it for all the available GPU architectures
(with a limited C runtime). It means that we can simply reuse our brush and
composition code (including the XSIMD one) inside these DPC++ blocks
without duplicates. When I tested CUDA in 201x, My main concern was that we
would have to write the second copy of all our rendering code to use it.
DPC++ somewhat solves this issue.

-- 
Dmitry Kazakov

[Attachment #3 (text/html)]

<div dir="ltr">Hi, all!<br><br>The last weeks before the New Year I spent trying to \
build Intel&#39;s DPC++ compiler. As far as I understand, this beast is something \
like our &#39;xsimd&#39; library, but for offloading work to GPUs. Here I would like \
to share what I learned about it :)<br><br>tldr; my opinion about DPC++ is very \
positive, though we would have to invest a lot of time into that; on a positive side, \
we will be able to share some code with our XSIMD implementation.<br><br>That is what \
I learned in the process:<br><br>1) Intel DPC++ is a flavour of a normal C++ that \
allows automatic offloading of the code to the GPU. Basically, you write a normal C++ \
code, then pass it to a special &#39;Queue&#39; class as a lambda or a function \
pointer and the rest is done by the compiler automatically. The compiler can either \
compile it into intermediate representation (SPIR-V), which will later be compiled \
into GPU binary on the user&#39;s PC by the GPU driver, or precompile it directly \
into your target GPUs&#39; binary code. This approach looks very nice, because we can \
reuse our existing composition/brush code written in C++ inside these GPU routines, \
which will reduce maintenance burden a lot.<br><br>2) There is also a library called \
oneAPI. It is built on the top of that DPC++ compiler. We can use it to optimize the \
Gaussian Blur and other filters, but I don&#39;t think we can use it for brushes and \
composition.<br><br>3) Since DPC++ is an extension of C++, we should use a custom \
compiler for that. Basically, we should switch to an unstable branch of Clang spiced \
with Intel&#39;s patches. It sounds a little bit scary :)<br><br>4) As far as I can \
tell, DPC++ is supported only on Linux and Windows. I&#39;m not sure we can use it on \
Android or MacOS.<br><br>5) Not only will we have to switch to an unstable branch of \
Clang, we will also have to build the compiler ourselves (at least on Windows). \
Official builds support only MSVC, but we need a MinGW environment.<br><br>6) I have \
managed to compile and run this compiler with MinGW, but this process is extremely \
manual and flanky right now. More work will have to be done for that. Most probably, \
we will have to do cross-compilation from Linux, actually :)<br><br>7) The whole idea \
of DPC++ is really good. We write code in C++ and the compiler automatically builds \
it for all the available GPU architectures (with a limited C runtime). It means that \
we can simply reuse our brush and composition code (including the XSIMD one) inside \
these DPC++ blocks without duplicates. When I tested CUDA in 201x, My main concern \
was that we would have to write the second copy of all our rendering code to use it. \
DPC++ somewhat solves this issue.<br clear="all"><div><br></div>-- <br><div dir="ltr" \
class="gmail_signature" data-smartmail="gmail_signature">Dmitry Kazakov</div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic