'Re: Review Request: Remove saturation from fragment shader in GLES'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kwin
Subject:    Re: Review Request: Remove saturation from fragment shader in GLES
From:       Thomas_Lübking <thomas.luebking () web ! de>
Date:       2011-12-18 17:39:49
Message-ID: 20111218173949.8128.34414 () vidsolbach ! de
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

> On Nov. 28, 2011, 3:12 p.m., Thomas Lübking wrote:
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
> > max(tex.g,tex.b)) ); 
> > picks the value instead of the luminance, but will likely be faster on esp. those \
> > chips (which i suspect to do too many things in SW :) 
> > as a sidenote:
> > i think the current implementation is "wrong", at least not luminance. should in \
> > case be: 
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
> > 0.30, 0.59, 0.11 ), tex.rgb )); 
> > right now the luminance is darkened by the subpixel brightness - no idea whether \
> > that's intended 
> > 'nother sidenote:
> > if we don't revert b646e7eb9a67971d7abb8056c06016de920419f7 i'll likely somewhen \
> > do by accident. please accept my apologies in before ;-)
> > 
> > last sidenote:
> > maybe we should try whether external resources shadow compiled in ones to allow \
> > ppl. to easily replace things like the shaders w/o having to recompile kwin? (and \
> > in doubt ship "optimized" variants)
> 
> Martin Gräßlin wrote:
> I think I'll give the one liner a try. It would remove the branch though it means \
> quite some calculative overhead when we don't need the saturation. Maybe we could \
> keep the branch for GPUs with flow control and remove it for the old ones: 
> #ifdef FAST_GPU
> if (saturation != 1.0) 
> #endif
> {
> tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
> max(tex.g,tex.b)) ); }
> 
> second sidenote sounds like a good idea. Being able to change the shader without \
> recompiling is a huge plus on the ExoPC :-) 
> 
> Thomas Lübking wrote:
> I didn't mean to remove the branch.
> If the compilation doesn't however fail (but the branch is just unconditionally \
> entered) not even the preproc condition would be required. 
> I'm now about to test whether ext. binary .rcc's shadow compiled in resources.
> 
> Please also notice that the two oneliners are different - the second one is \
> (correct) luminance calculation. If there's not some reason to darken & raise \
> contrast with desaturation, the current implementation is "wrong" (in quoted 'cause \
> there's no "right" desaturation either) 
> Fredrik Höglund wrote:
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
> > max(tex.g,tex.b)) ); 
> > picks the value instead of the luminance, but will likely be faster on esp. those \
> > chips (which i suspect to do too many things in SW :)
> 
> I may be wrong, but it looks to me like you've replaced a single DP3 instruction \
> with two MAX instructions. The i915 has dedicated HW for fragment shaders, but not \
> for vertex shaders. 
> > as a sidenote:
> > i think the current implementation is "wrong", at least not luminance. should in \
> > case be: 
> > tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
> > 0.30, 0.59, 0.11 ), tex.rgb ));
> 
> Yes, the current implementation is definitely wrong. You should use the built-in \
> mix() function though. 
> > last sidenote:
> > maybe we should try whether external resources shadow compiled in ones to allow \
> > ppl. to easily replace things like the shaders w/o having to recompile kwin? (and \
> > in doubt ship "optimized" variants)
> 
> No, kwin should generate the source code for the shaders at runtime and optimize \
> them for the current rendering operation. 
> 
> Thomas Lübking wrote:
> > The i915 has dedicated HW for fragment shaders, but not for vertex shaders.
> Indeed, i messed that up - the i8xx was the "sick" generation that did DX9 in SW - \
> sorry. 
> > You should use the built-in mix() function though.
> Interesting.
> This might be a dumb question, but do you happen to know why mix is and fma is NOT \
> supported (on that GPU, not even with three floats)  while the "version support" \
> matrices in the khronos glgs man look exactly the same? 
> However, tried
> if (saturation != 1.0) {
> tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb )), tex.rgb, \
> vec3(saturation)); }
> and -branch support or not- the speed feels ok to me (up to the shaderless glx \
> variant) For further inspections i need to move to the vsync branch.
> 
> Fredrik Höglund wrote:
> > > You should use the built-in mix() function though.
> > Interesting.
> > This might be a dumb question, but do you happen to know why mix is and fma is \
> > NOT supported (on that GPU, not even with three floats) while the "version \
> > support" matrices in the khronos glgs man look exactly the same?
> 
> The version support matrix is wrong; fma() is only available in GLSL 4.0 and later. \
> But it doesn't matter because the compiler will optimize multiply + add anyway. \
> fma() only makes sense in an assignment to a variable declared with the 'precise' \
> qualifier, since it requires that the operation is performed exactly as written. 
> The mix() function doesn't have a corresponding native instruction in most GPU's. \
> NV30 seems to be an exception. It's still a good idea to use it though because it \
> gives the compiler more information and thus more optimization opportunities. It \
> also makes the code more readable. 
> By the way, there's no need to cast saturation to a vec3 in the mix() call.
> 
> Thomas Lübking wrote:
> @Fredrik: many thanks for the explanations (yes, late - but better than never ;-)
> 
> @Martin: BUMP - the calculation in the shader, GLES or not, is still not "correct" \
> =P 
> Martin Gräßlin wrote:
> do you have a better calculation with correct mix function? If yes just ship it.
> 
> Thomas Lübking wrote:
> yes, this is luminance:
> if (saturation != 1.0) {
> tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb )), tex.rgb, \
> vec3(saturation)); }
> 
> gonna ship that.

... minus the vec3 cast - fredrik is (of course) right on this

- Thomas

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/103227/#review8566
-----------------------------------------------------------

On Nov. 24, 2011, 8:29 a.m., Martin Gräßlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/103227/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2011, 8:29 a.m.)
> 
> 
> Review request for kwin, Marco Martin and Fredrik Höglund.
> 
> 
> Description
> -------
> 
> ifdef the saturation branch in the fragment shader for gles. Reason: hardware \
> probably does not support branches and at least the exopc gains a performance \
> improvement on gles when the branch is dropped. 
> As it means a regression compared to 4.7 for hardware with branch support in \
> fragment shader on gles I'm only targeting master. 
> KWin on GLX is unchanged.
> 
> 
> Diffs
> -----
> 
> kwin/libkwineffects/kwingltexture.cpp 4155fb1 
> kwin/scene-fragment.glsl 22463c1 
> 
> Diff: http://git.reviewboard.kde.org/r/103227/diff/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Martin Gräßlin
> 
> 

[Attachment #5 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="http://git.reviewboard.kde.org/r/103227/">http://git.reviewboard.kde.org/r/103227/</a>
  </td>
    </tr>
   </table>
   <br />

<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On November 28th, 2011, 3:12 p.m., <b>Thomas \
Lübking</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - \
saturation)*max(tex.r, max(tex.g,tex.b)) );

picks the value instead of the luminance, but will likely be faster on esp. those \
chips (which i suspect to do too many things in SW :)

as a sidenote:
i think the current implementation is &quot;wrong&quot;, at least not luminance. \
should in case be:

   tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
0.30, 0.59, 0.11 ), tex.rgb ));

right now the luminance is darkened by the subpixel brightness - no idea whether \
that&#39;s intended

&#39;nother sidenote:
if we don&#39;t revert b646e7eb9a67971d7abb8056c06016de920419f7 i&#39;ll likely \
somewhen do by accident. please accept my apologies in before ;-)

last sidenote:
maybe we should try whether external resources shadow compiled in ones to allow ppl. \
to easily replace things like the shaders w/o having to recompile kwin? (and in doubt \
ship &quot;optimized&quot; variants)</pre>  </blockquote>

 <p>On November 28th, 2011, 7:30 p.m., <b>Martin Gräßlin</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I think I&#39;ll give \
the one liner a try. It would remove the branch though it means quite some \
calculative overhead when we don&#39;t need the saturation. Maybe we could keep the \
branch for GPUs with flow control and remove it for the old ones:

#ifdef FAST_GPU
if (saturation != 1.0) 
#endif
{
    tex.rgb = tex.rgb * vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, \
max(tex.g,tex.b)) ); }

second sidenote sounds like a good idea. Being able to change the shader without \
recompiling is a huge plus on the ExoPC :-) </pre>
 </blockquote>

 <p>On November 28th, 2011, 7:51 p.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I didn&#39;t mean to \
remove the branch. If the compilation doesn&#39;t however fail (but the branch is \
just unconditionally entered) not even the preproc condition would be required.

I&#39;m now about to test whether ext. binary .rcc&#39;s shadow compiled in \
resources.

Please also notice that the two oneliners are different - the second one is (correct) \
luminance calculation. If there&#39;s not some reason to darken &amp; raise contrast \
with desaturation, the current implementation is &quot;wrong&quot; (in quoted \
&#39;cause there&#39;s no &quot;right&quot; desaturation either)</pre>  </blockquote>

 <p>On November 28th, 2011, 9:06 p.m., <b>Fredrik Höglund</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">&gt; tex.rgb = tex.rgb * \
vec3( saturation ) + vec3( (1.0 - saturation)*max(tex.r, max(tex.g,tex.b)) ); &gt;
&gt; picks the value instead of the luminance, but will likely be faster on esp. \
those chips (which i suspect to do too many things in SW :)

I may be wrong, but it looks to me like you&#39;ve replaced a single DP3 instruction \
with two MAX instructions. The i915 has dedicated HW for fragment shaders, but not \
for vertex shaders.

&gt; as a sidenote:
&gt; i think the current implementation is &quot;wrong&quot;, at least not luminance. \
should in case be: &gt;
&gt;   tex.rgb = tex.rgb * vec3( saturation ) + vec3( ( 1.0 - saturation )*dot( vec3( \
0.30, 0.59, 0.11 ), tex.rgb ));

Yes, the current implementation is definitely wrong. You should use the built-in \
mix() function though.

&gt; last sidenote:
&gt; maybe we should try whether external resources shadow compiled in ones to allow \
ppl. to easily replace things like the shaders w/o having to recompile kwin? (and in \
doubt ship &quot;optimized&quot; variants)

No, kwin should generate the source code for the shaders at runtime and optimize them \
for the current rendering operation. </pre>
 </blockquote>

 <p>On November 28th, 2011, 10:40 p.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">&gt; The i915 has \
dedicated HW for fragment shaders, but not for vertex shaders. Indeed, i messed that \
up - the i8xx was the &quot;sick&quot; generation that did DX9 in SW - sorry.

&gt; You should use the built-in mix() function though.
Interesting.
This might be a dumb question, but do you happen to know why mix is and fma is NOT \
supported (on that GPU, not even with three floats)  while the &quot;version \
support&quot; matrices in the khronos glgs man look exactly the same?

However, tried
if (saturation != 1.0) {
   tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb )), tex.rgb, \
vec3(saturation)); }
and -branch support or not- the speed feels ok to me (up to the shaderless glx \
variant) For further inspections i need to move to the vsync branch.</pre>
 </blockquote>

 <p>On November 29th, 2011, 5:35 p.m., <b>Fredrik Höglund</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">&gt;&gt; You should use \
the built-in mix() function though. &gt; Interesting.
&gt; This might be a dumb question, but do you happen to know why mix is and fma is \
NOT supported (on that GPU, not even with three floats) while the &quot;version \
support&quot; matrices in the khronos glgs man look exactly the same?

The version support matrix is wrong; fma() is only available in GLSL 4.0 and later. \
But it doesn&#39;t matter because the compiler will optimize multiply + add anyway. \
fma() only makes sense in an assignment to a variable declared with the \
&#39;precise&#39; qualifier, since it requires that the operation is performed \
exactly as written.

The mix() function doesn&#39;t have a corresponding native instruction in most \
GPU&#39;s. NV30 seems to be an exception. It&#39;s still a good idea to use it though \
because it gives the compiler more information and thus more optimization \
opportunities. It also makes the code more readable.

By the way, there&#39;s no need to cast saturation to a vec3 in the mix() call.</pre>
 </blockquote>

 <p>On December 9th, 2011, 8:20 p.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">@Fredrik: many thanks \
for the explanations (yes, late - but better than never ;-)

@Martin: BUMP - the calculation in the shader, GLES or not, is still not \
&quot;correct&quot; =P</pre>  </blockquote>

 <p>On December 18th, 2011, 5:10 p.m., <b>Martin Gräßlin</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">do you have a better \
calculation with correct mix function? If yes just ship it.</pre>  </blockquote>

 <p>On December 18th, 2011, 5:38 p.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">yes, this is luminance: \
if (saturation != 1.0) {  tex.rgb = mix(vec3(dot( vec3( 0.30, 0.59, 0.11 ), tex.rgb \
)), tex.rgb, vec3(saturation)); }

gonna ship that.</pre>
 </blockquote>

</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">... minus the vec3 cast \
- fredrik is (of course) right on this</pre> <br />

<p>- Thomas</p>

<br />
<p>On November 24th, 2011, 8:29 a.m., Martin Gräßlin wrote:</p>

<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/media/rb/images/review_request_box_top_bg.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for kwin, Marco Martin and Fredrik Höglund.</div>
<div>By Martin Gräßlin.</div>

<p style="color: grey;"><i>Updated Nov. 24, 2011, 8:29 a.m.</i></p>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">ifdef the saturation branch in the fragment shader for gles. Reason: \
hardware probably does not support branches and at least the exopc gains a \
performance improvement on gles when the branch is dropped.

As it means a regression compared to 4.7 for hardware with branch support in fragment \
shader on gles I&#39;m only targeting master.

KWin on GLX is unchanged.</pre>
  </td>
 </tr>
</table>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>kwin/libkwineffects/kwingltexture.cpp <span style="color: \
grey">(4155fb1)</span></li>

 <li>kwin/scene-fragment.glsl <span style="color: grey">(22463c1)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/103227/diff/" style="margin-left: \
3em;">View Diff</a></p>

  </td>
 </tr>
</table>

  </div>
 </body>
</html>

_______________________________________________
kwin mailing list
kwin@kde.org
https://mail.kde.org/mailman/listinfo/kwin

[prev in list] [next in list] [prev in thread] [next in thread]