[prev in list] [next in list] [prev in thread] [next in thread]
List: kwin
Subject: Re: Review Request: Remove saturation from fragment shader in GLES
From: Thomas_Lübking <thomas.luebking () web ! de>
Date: 2011-12-18 17:39:49
Message-ID: 20111218173949.8128.34414 () vidsolbach ! de
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
> On Nov. 28, 2011, 3:12 p.m., Thomas Lübking wrote:
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
> > max(tex.g,tex.b)) );
> > picks the value instead of the luminance, but will likely be faster on esp. those \
> > chips (which i suspect to do too many things in SW :)
> > as a sidenote:
> > i think the current implementation is "wrong", at least not luminance. should in \
> > case be:
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
> > 0.30, 0.59, 0.11 ), tex.rgb ));
> > right now the luminance is darkened by the subpixel brightness - no idea whether \
> > that's intended
> > 'nother sidenote:
> > if we don't revert b646e7eb9a67971d7abb8056c06016de920419f7 i'll likely somewhen \
> > do by accident. please accept my apologies in before ;-)
> >
> > last sidenote:
> > maybe we should try whether external resources shadow compiled in ones to allow \
> > ppl. to easily replace things like the shaders w/o having to recompile kwin? (and \
> > in doubt ship "optimized" variants)
>
> Martin Gräßlin wrote:
> I think I'll give the one liner a try. It would remove the branch though it means \
> quite some calculative overhead when we don't need the saturation. Maybe we could \
> keep the branch for GPUs with flow control and remove it for the old ones:
> #ifdef FAST_GPU
> if (saturation != 1.0)
> #endif
> {
> tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
> max(tex.g,tex.b)) ); }
>
> second sidenote sounds like a good idea. Being able to change the shader without \
> recompiling is a huge plus on the ExoPC :-)
>
> Thomas Lübking wrote:
> I didn't mean to remove the branch.
> If the compilation doesn't however fail (but the branch is just unconditionally \
> entered) not even the preproc condition would be required.
> I'm now about to test whether ext. binary .rcc's shadow compiled in resources.
>
> Please also notice that the two oneliners are different - the second one is \
> (correct) luminance calculation. If there's not some reason to darken & raise \
> contrast with desaturation, the current implementation is "wrong" (in quoted 'cause \
> there's no "right" desaturation either)
> Fredrik Höglund wrote:
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
> > max(tex.g,tex.b)) );
> > picks the value instead of the luminance, but will likely be faster on esp. those \
> > chips (which i suspect to do too many things in SW :)
>
> I may be wrong, but it looks to me like you've replaced a single DP3 instruction \
> with two MAX instructions. The i915 has dedicated HW for fragment shaders, but not \
> for vertex shaders.
> > as a sidenote:
> > i think the current implementation is "wrong", at least not luminance. should in \
> > case be:
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
> > 0.30, 0.59, 0.11 ), tex.rgb ));
>
> Yes, the current implementation is definitely wrong. You should use the built-in \
> mix() function though.
> > last sidenote:
> > maybe we should try whether external resources shadow compiled in ones to allow \
> > ppl. to easily replace things like the shaders w/o having to recompile kwin? (and \
> > in doubt ship "optimized" variants)
>
> No, kwin should generate the source code for the shaders at runtime and optimize \
> them for the current rendering operation.
>
> Thomas Lübking wrote:
> > The i915 has dedicated HW for fragment shaders, but not for vertex shaders.
> Indeed, i messed that up - the i8xx was the "sick" generation that did DX9 in SW - \
> sorry.
> > You should use the built-in mix() function though.
> Interesting.
> This might be a dumb question, but do you happen to know why mix is and fma is NOT \
> supported (on that GPU, not even with three floats) while the "version support" \
> matrices in the khronos glgs man look exactly the same?
> However, tried
> if (saturation != 1.0) {
> tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb )), tex.rgb, \
> vec3(saturation)); }
> and -branch support or not- the speed feels ok to me (up to the shaderless glx \
> variant) For further inspections i need to move to the vsync branch.
>
> Fredrik Höglund wrote:
> > > You should use the built-in mix() function though.
> > Interesting.
> > This might be a dumb question, but do you happen to know why mix is and fma is \
> > NOT supported (on that GPU, not even with three floats) while the "version \
> > support" matrices in the khronos glgs man look exactly the same?
>
> The version support matrix is wrong; fma() is only available in GLSL 4.0 and later. \
> But it doesn't matter because the compiler will optimize multiply + add anyway. \
> fma() only makes sense in an assignment to a variable declared with the 'precise' \
> qualifier, since it requires that the operation is performed exactly as written.
> The mix() function doesn't have a corresponding native instruction in most GPU's. \
> NV30 seems to be an exception. It's still a good idea to use it though because it \
> gives the compiler more information and thus more optimization opportunities. It \
> also makes the code more readable.
> By the way, there's no need to cast saturation to a vec3 in the mix() call.
>
> Thomas Lübking wrote:
> @Fredrik: many thanks for the explanations (yes, late - but better than never ;-)
>
> @Martin: BUMP - the calculation in the shader, GLES or not, is still not "correct" \
> =P
> Martin Gräßlin wrote:
> do you have a better calculation with correct mix function? If yes just ship it.
>
> Thomas Lübking wrote:
> yes, this is luminance:
> if (saturation != 1.0) {
> tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb )), tex.rgb, \
> vec3(saturation)); }
>
> gonna ship that.
... minus the vec3 cast - fredrik is (of course) right on this
- Thomas
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/103227/#review8566
-----------------------------------------------------------
On Nov. 24, 2011, 8:29 a.m., Martin Gräßlin wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/103227/
> -----------------------------------------------------------
>
> (Updated Nov. 24, 2011, 8:29 a.m.)
>
>
> Review request for kwin, Marco Martin and Fredrik Höglund.
>
>
> Description
> -------
>
> ifdef the saturation branch in the fragment shader for gles. Reason: hardware \
> probably does not support branches and at least the exopc gains a performance \
> improvement on gles when the branch is dropped.
> As it means a regression compared to 4.7 for hardware with branch support in \
> fragment shader on gles I'm only targeting master.
> KWin on GLX is unchanged.
>
>
> Diffs
> -----
>
> kwin/libkwineffects/kwingltexture.cpp 4155fb1
> kwin/scene-fragment.glsl 22463c1
>
> Diff: http://git.reviewboard.kde.org/r/103227/diff/diff
>
>
> Testing
> -------
>
>
> Thanks,
>
> Martin Gräßlin
>
>
[Attachment #5 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="http://git.reviewboard.kde.org/r/103227/">http://git.reviewboard.kde.org/r/103227/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <p style="margin-top: 0;">On November 28th, 2011, 3:12 p.m., <b>Thomas \
Lübking</b> wrote:</p> <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;"> <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - \
saturation)*max(tex.r, max(tex.g,tex.b)) );
picks the value instead of the luminance, but will likely be faster on esp. those \
chips (which i suspect to do too many things in SW :)
as a sidenote:
i think the current implementation is "wrong", at least not luminance. \
should in case be:
tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
0.30, 0.59, 0.11 ), tex.rgb ));
right now the luminance is darkened by the subpixel brightness - no idea whether \
that's intended
'nother sidenote:
if we don't revert b646e7eb9a67971d7abb8056c06016de920419f7 i'll likely \
somewhen do by accident. please accept my apologies in before ;-)
last sidenote:
maybe we should try whether external resources shadow compiled in ones to allow ppl. \
to easily replace things like the shaders w/o having to recompile kwin? (and in doubt \
ship "optimized" variants)</pre> </blockquote>
<p>On November 28th, 2011, 7:30 p.m., <b>Martin Gräßlin</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I think I'll give \
the one liner a try. It would remove the branch though it means quite some \
calculative overhead when we don't need the saturation. Maybe we could keep the \
branch for GPUs with flow control and remove it for the old ones:
#ifdef FAST_GPU
if (saturation != 1.0)
#endif
{
tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
max(tex.g,tex.b)) ); }
second sidenote sounds like a good idea. Being able to change the shader without \
recompiling is a huge plus on the ExoPC :-) </pre>
</blockquote>
<p>On November 28th, 2011, 7:51 p.m., <b>Thomas Lübking</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I didn't mean to \
remove the branch. If the compilation doesn't however fail (but the branch is \
just unconditionally entered) not even the preproc condition would be required.
I'm now about to test whether ext. binary .rcc's shadow compiled in \
resources.
Please also notice that the two oneliners are different - the second one is (correct) \
luminance calculation. If there's not some reason to darken & raise contrast \
with desaturation, the current implementation is "wrong" (in quoted \
'cause there's no "right" desaturation either)</pre> </blockquote>
<p>On November 28th, 2011, 9:06 p.m., <b>Fredrik Höglund</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">> tex.rgb = tex.rgb * \
vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, max(tex.g,tex.b)) ); >
> picks the value instead of the luminance, but will likely be faster on esp. \
those chips (which i suspect to do too many things in SW :)
I may be wrong, but it looks to me like you've replaced a single DP3 instruction \
with two MAX instructions. The i915 has dedicated HW for fragment shaders, but not \
for vertex shaders.
> as a sidenote:
> i think the current implementation is "wrong", at least not luminance. \
should in case be: >
> tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
0.30, 0.59, 0.11 ), tex.rgb ));
Yes, the current implementation is definitely wrong. You should use the built-in \
mix() function though.
> last sidenote:
> maybe we should try whether external resources shadow compiled in ones to allow \
ppl. to easily replace things like the shaders w/o having to recompile kwin? (and in \
doubt ship "optimized" variants)
No, kwin should generate the source code for the shaders at runtime and optimize them \
for the current rendering operation. </pre>
</blockquote>
<p>On November 28th, 2011, 10:40 p.m., <b>Thomas Lübking</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">> The i915 has \
dedicated HW for fragment shaders, but not for vertex shaders. Indeed, i messed that \
up - the i8xx was the "sick" generation that did DX9 in SW - sorry.
> You should use the built-in mix() function though.
Interesting.
This might be a dumb question, but do you happen to know why mix is and fma is NOT \
supported (on that GPU, not even with three floats) while the "version \
support" matrices in the khronos glgs man look exactly the same?
However, tried
if (saturation != 1.0) {
tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb )), tex.rgb, \
vec3(saturation)); }
and -branch support or not- the speed feels ok to me (up to the shaderless glx \
variant) For further inspections i need to move to the vsync branch.</pre>
</blockquote>
<p>On November 29th, 2011, 5:35 p.m., <b>Fredrik Höglund</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">>> You should use \
the built-in mix() function though. > Interesting.
> This might be a dumb question, but do you happen to know why mix is and fma is \
NOT supported (on that GPU, not even with three floats) while the "version \
support" matrices in the khronos glgs man look exactly the same?
The version support matrix is wrong; fma() is only available in GLSL 4.0 and later. \
But it doesn't matter because the compiler will optimize multiply + add anyway. \
fma() only makes sense in an assignment to a variable declared with the \
'precise' qualifier, since it requires that the operation is performed \
exactly as written.
The mix() function doesn't have a corresponding native instruction in most \
GPU's. NV30 seems to be an exception. It's still a good idea to use it though \
because it gives the compiler more information and thus more optimization \
opportunities. It also makes the code more readable.
By the way, there's no need to cast saturation to a vec3 in the mix() call.</pre>
</blockquote>
<p>On December 9th, 2011, 8:20 p.m., <b>Thomas Lübking</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">@Fredrik: many thanks \
for the explanations (yes, late - but better than never ;-)
@Martin: BUMP - the calculation in the shader, GLES or not, is still not \
"correct" =P</pre> </blockquote>
<p>On December 18th, 2011, 5:10 p.m., <b>Martin Gräßlin</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">do you have a better \
calculation with correct mix function? If yes just ship it.</pre> </blockquote>
<p>On December 18th, 2011, 5:38 p.m., <b>Thomas Lübking</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;"> <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">yes, this is luminance: \
if (saturation != 1.0) { tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb \
)), tex.rgb, vec3(saturation)); }
gonna ship that.</pre>
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">... minus the vec3 cast \
- fredrik is (of course) right on this</pre> <br />
<p>- Thomas</p>
<br />
<p>On November 24th, 2011, 8:29 a.m., Martin Gräßlin wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/media/rb/images/review_request_box_top_bg.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for kwin, Marco Martin and Fredrik Höglund.</div>
<div>By Martin Gräßlin.</div>
<p style="color: grey;"><i>Updated Nov. 24, 2011, 8:29 a.m.</i></p>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">ifdef the saturation branch in the fragment shader for gles. Reason: \
hardware probably does not support branches and at least the exopc gains a \
performance improvement on gles when the branch is dropped.
As it means a regression compared to 4.7 for hardware with branch support in fragment \
shader on gles I'm only targeting master.
KWin on GLX is unchanged.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>kwin/libkwineffects/kwingltexture.cpp <span style="color: \
grey">(4155fb1)</span></li>
<li>kwin/scene-fragment.glsl <span style="color: grey">(22463c1)</span></li>
</ul>
<p><a href="http://git.reviewboard.kde.org/r/103227/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
_______________________________________________
kwin mailing list
kwin@kde.org
https://mail.kde.org/mailman/listinfo/kwin
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic