[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cairo
Subject:    Re: [cairo] gallium surface still maintained ?
From:       Petr Kobalíček <kobalicek.petr () gmail ! com>
Date:       2016-08-05 20:40:22
Message-ID: CAB2Z3OcJBjGkve9RXu_bOdurkAk-60-tc+JoX+pbgT+4CDXXGQ () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


ARM port is planned - I have already a basic overview about the ARM32 and
ARM64 instruction sets (and their differences) and I also started some work
on ArmAssembler (asmjit). But it will not happen before the X86 version is
production ready.

Just to give you some overview - Blend2D's X86 backend is currently around
8000 lines of C++ code. This produces optimized pipelines of all supported
combinations (fetch-op, blend-op, rasterizer-op) for many possible
combinations detected at runtime (1, 4, 8, 16 pixels per loop iteration). I
expect it to grow in the future (especially if I target AVX512, which has
some innovative concepts). This means that an initial ARM port would be
around the same size. I think it's not bad if I consider how much
architecture-specific code is in pixman, for example.


On Fri, Aug 5, 2016 at 6:51 PM, Guillermo Rodriguez <
guillerodriguez.dev@gmail.com> wrote:

> Hello,
>
> I definitely share your view.
>
> Blend2D looks very interesting. I hope there will be an ARM port in the
> future; for what I have seen the JIT engine is currently targetting x86
> architectures only.
>
> Best regards,
>
> Guillermo
>
> 2016-08-05 16:49 GMT+02:00 Petr Kobal=C3=AD=C4=8Dek <kobalicek.petr@gmail=
.com>:
>
>> I'm reading the discussion and I would like to contribute.
>>
>> I'm author of Blend2D (http://blend2d.com) and I'm just finalizing an
>> evaluation version. And I think, from my own experience, that CPU render=
ing
>> is feasible and can be really fast. The problem is that libraries are no=
t
>> optimized to use CPU well.
>>
>> If you check out the pipelines of open-source 2D libraries then you will
>> see basically the same thing - it many cases pixels are just copied from
>> one place to another many times before they are written to the destinati=
on
>> buffer. Another thing is the dispatching mechanism - in many cases these
>> libraries call tens of functions (sometimes even allocate dynamic memory=
)
>> before pixels start changing - and this happens every time you call some
>> drawing function that is not "fillRect".
>>
>> I think that the most critical is UI and vector-art rendering, because
>> these generally perform many drawing calls that render tiny things.
>>
>> I have my own benchmarking suite that compares performance of Blend2D,
>> Cairo, and Qt. I can announce on this list when I release the beta versi=
on
>> so Cairo devs and users can see the real difference between "optimized f=
or
>> CPU" and "supports CPU".
>>
>> Cheers,
>> Petr
>>
>>
>>
>> On Fri, Aug 5, 2016 at 11:38 AM, Guillermo Rodriguez <
>> guillerodriguez.dev@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> 2016-08-05 11:00 GMT+02:00 Enrico Weigelt, metux IT consult <
>>> enrico.weigelt@gr13.net>:
>>>
>>>> On 05.08.2016 10:35, Enrico Weigelt, metux IT consult wrote:
>>>>
>>>> <snip>
>>>>
>>>
>>>> Oh, could you check whether your driver sets DRM_CAP_DUMB_PREFER_SHADO=
W.
>>>>
>>>> https://lists.freedesktop.org/archives/dri-devel/2016-August
>>>> /114970.html
>>>>
>>>> On my box (w/ an i915) the driver sets this flag, so I'll have to assu=
me
>>>> that writing individual bytes going to be slow, and a shadow buffer
>>>> should be used, which then is copied over in bursts.
>>>>
>>>
>>> The driver I am using does not set that flag.
>>>
>>> Anyway my application does all compositing on a back (shadow) buffer,
>>> and then blits dirty regions to the DRM buffer.
>>>
>>>
>>>>
>>>> Now the interesting question: how to archieve that ?
>>>> Is there some easy way to trace which pixels/regions in a image surfac=
e
>>>> have been touched ?
>>>>
>>>
>>> I keep track of dirty regions which are merged together using a naive
>>> algorithm: The resulting dirty region is just a rectangle that encloses=
 all
>>> dirty rectangles. That is then blitted to the screen in each update cyc=
le.
>>>
>>> This merging algorithm is obviously inefficient in some cases. For
>>> example let's say you have two small dirty regions in opposite corners =
of
>>> the screen; the merged dirty region will be large, and blitting that wi=
ll
>>> be less efficient than blitting the two original dirty regions. You cou=
ld
>>> optimize this by applying some heuristics in order to decide when to me=
rge
>>> and when not to merge, but I haven't found the need to do that yet.
>>>
>>> Guillermo
>>>
>>>
>>> --
>>> cairo mailing list
>>> cairo@cairographics.org
>>> https://lists.cairographics.org/mailman/listinfo/cairo
>>>
>>
>>
>

[Attachment #5 (text/html)]

<div dir="ltr">ARM port is planned - I have already a basic overview about the ARM32 \
and ARM64 instruction sets (and their differences) and I also started some work on \
ArmAssembler (asmjit). But it will not happen before the X86 version is production \
ready.<br><div><br></div><div>Just to give you some overview - Blend2D&#39;s X86 \
backend is currently around 8000 lines of C++ code. This produces optimized pipelines \
of all supported combinations (fetch-op, blend-op, rasterizer-op) for many possible \
combinations detected at runtime (1, 4, 8, 16 pixels per loop iteration). I expect it \
to grow in the future (especially if I target AVX512, which has some innovative \
concepts). This means that an initial ARM port would be around the same size. I think \
it&#39;s not bad if I consider how much architecture-specific code is in pixman, for \
example.</div><div><br></div></div><div class="gmail_extra"><br><div \
class="gmail_quote">On Fri, Aug 5, 2016 at 6:51 PM, Guillermo Rodriguez <span \
dir="ltr">&lt;<a href="mailto:guillerodriguez.dev@gmail.com" \
target="_blank">guillerodriguez.dev@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div>I definitely share \
your view.<br><div><br>Blend2D looks very interesting. I hope there will be an ARM \
port in the future; for what I have seen the JIT engine is currently targetting x86 \
architectures only.</div><div><br>Best \
regards,</div><div><br></div><div>Guillermo</div><div><div class="h5"><div \
class="gmail_extra"><br><div class="gmail_quote">2016-08-05 16:49 GMT+02:00 Petr \
Kobalíček <span dir="ltr">&lt;<a href="mailto:kobalicek.petr@gmail.com" \
target="_blank">kobalicek.petr@gmail.com</a>&gt;</span>:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">I&#39;m reading the discussion and I would \
like to contribute.<div><br></div><div>I&#39;m author of Blend2D (<a \
href="http://blend2d.com" target="_blank">http://blend2d.com</a>) and I&#39;m just \
finalizing an evaluation version. And I think, from my own experience, that CPU \
rendering is feasible and can be really fast. The problem is that libraries are not \
optimized to use CPU well.</div><div><br></div><div>If you check out the pipelines of \
open-source 2D libraries then you will see basically the same thing - it many cases \
pixels are just copied from one place to another many times before they are written \
to the destination buffer. Another thing is the dispatching mechanism - in many cases \
these libraries call tens of functions (sometimes even allocate dynamic memory) \
before pixels start changing - and this happens every time you call some drawing \
function that is not &quot;fillRect&quot;.</div><div><br></div><div>I think that the \
most critical is UI and vector-art rendering, because these generally perform many \
drawing calls that render tiny things.</div><div><br></div><div>I have my own \
benchmarking suite that compares performance of Blend2D, Cairo, and Qt. I can \
announce on this list when I release the beta version so Cairo devs and users can see \
the real difference between &quot;optimized for CPU&quot; and &quot;supports \
CPU&quot;.</div><div><br></div><div>Cheers,</div><div>Petr</div><div><br></div><div><br></div></div><div \
class="gmail_extra"><br><div class="gmail_quote"><div><div>On Fri, Aug 5, 2016 at \
11:38 AM, Guillermo Rodriguez <span dir="ltr">&lt;<a \
href="mailto:guillerodriguez.dev@gmail.com" \
target="_blank">guillerodriguez.dev@gmail.com</a><wbr>&gt;</span> \
wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div dir="ltr">Hi,<div \
class="gmail_extra"><br><div class="gmail_quote"><span>2016-08-05 11:00 GMT+02:00 \
Enrico Weigelt, metux IT consult <span dir="ltr">&lt;<a \
href="mailto:enrico.weigelt@gr13.net" \
target="_blank">enrico.weigelt@gr13.net</a>&gt;</span>:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">On 05.08.2016 10:35, Enrico Weigelt, metux IT consult \
wrote:<br> <br>
&lt;snip&gt;<br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><br> Oh, could you check whether \
your driver sets DRM_CAP_DUMB_PREFER_SHADOW.<br> <br>
<a href="https://lists.freedesktop.org/archives/dri-devel/2016-August/114970.html" \
rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>archives/dri-devel/2016-August<wbr>/114970.html</a><br>
 <br>
On my box (w/ an i915) the driver sets this flag, so I&#39;ll have to assume<br>
that writing individual bytes going to be slow, and a shadow buffer<br>
should be used, which then is copied over in \
bursts.<br></blockquote><div><br></div></span><div>The driver I am using does not set \
that flag.</div><div><br></div><div>Anyway my application does all compositing on a \
back (shadow) buffer, and then blits dirty regions to the DRM \
buffer.</div><span><div>  <br></div><blockquote class="gmail_quote" style="margin:0 0 \
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
Now the interesting question: how to archieve that ?<br>
Is there some easy way to trace which pixels/regions in a image surface<br>
have been touched ?<br></blockquote><div><br></div></span><div>I keep track of dirty \
regions which are merged together using a naive algorithm: The resulting dirty region \
is just a rectangle that encloses all dirty rectangles. That is then blitted to the \
screen in each update cycle.</div><div><br></div><div>This merging algorithm is \
obviously inefficient in some cases. For example let&#39;s say you have two small \
dirty regions in opposite corners of the screen; the merged dirty region will be \
large, and blitting that will be less efficient than blitting the two original dirty \
regions. You could optimize this by applying some heuristics in order to decide when \
to merge and when not to merge, but I haven&#39;t found the need to do that \
yet.</div><span><font color="#888888"><div>  \
</div><div>Guillermo</div></font></span></div><br></div></div> \
<br></div></div><span>--<br> cairo mailing list<br>
<a href="mailto:cairo@cairographics.org" \
target="_blank">cairo@cairographics.org</a><br> <a \
href="https://lists.cairographics.org/mailman/listinfo/cairo" rel="noreferrer" \
target="_blank">https://lists.cairographics.or<wbr>g/mailman/listinfo/cairo</a><br></span></blockquote></div><br></div>
 </blockquote></div><br></div></div></div></div></div>
</blockquote></div><br></div>


[Attachment #6 (text/plain)]

-- 
cairo mailing list
cairo@cairographics.org
https://lists.cairographics.org/mailman/listinfo/cairo

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic