[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesa3d-dev
Subject:    Re: [Mesa-dev] [PATCH 8/9] i965/nir: Run the ffma peephole after the rest of the optimizations
From:       Matt Turner <mattst88 () gmail ! com>
Date:       2015-03-31 17:47:49
Message-ID: CAEdQ38EE7G0iFWgaTA88G76MyC7tw2A+_RXpJLQKgCTTRT2RDQ () mail ! gmail ! com
[Download RAW message or body]

On Mon, Mar 23, 2015 at 8:40 PM, Jason Ekstrand <jason@jlekstrand.net> wrote:
> On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88@gmail.com> wrote:
> > On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason@jlekstrand.net> wrote:
> > > The idea here is that fusing multiply-add combinations too early can reduce
> > > our ability to perform CSE and value-numbering.  Instead, we split ffma
> > > opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
> > > Unless an algebraic pass does something silly where it inserts something
> > > between the multiply and the add, splitting and re-fusing should never
> > > cause a problem.  We run the late algebraic optimizations after this so
> > > that things like compare-with-zero don't hurt our ability to fuse things.
> > > 
> > > shader-db results for fragment shaders on Haswell:
> > > total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
> > > instructions in affected programs:     989359 -> 978057 (-1.14%)
> > > helped:                                5308
> > > HURT:                                  97
> > > GAINED:                                78
> > > LOST:                                  5
> > > 
> > > This does, unfortunately, cause some substantial hurt to a shader in Kerbal
> > > Space Program.  However, the damage is caused by changing a single
> > > instruction from a ffma to an add.  This, in turn, *decreases* register
> > > pressure in one part of the program causing it to fail to register allocate
> > > and spill.  Given the overwhelmingly positive results in other shaders and
> > > the fact that the NIR for the Kerbal shaders is actually better, this
> > > should be considered a positive.
> > > ---
> > > src/mesa/drivers/dri/i965/brw_context.c  |  1 +
> > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 ++++++++--
> > > 2 files changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c \
> > > b/src/mesa/drivers/dri/i965/brw_context.c index ed6fdff..7cab1c9 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.c
> > > @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context *brw)
> > > 
> > > static const nir_shader_compiler_options gen6_nir_options = {
> > > .native_integers = true,
> > > +      .lower_ffma = true,
> > 
> > I see what's going on, but this probably deserves a comment.
> 
> Sure.  How about:
> 
> > In order to help allow for better CSE at the NIR level we tell NIR to split all \
> > ffma instructions during opt_algebraic and we then re-combine them as a later \
> > step.

Sounds good.

> 
> > > };
> > > 
> > > /* We want the GLSL compiler to emit code that uses condition codes */
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp \
> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index cdf9b01..b0b86e3 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > @@ -51,8 +51,6 @@ nir_optimize(nir_shader *nir)
> > > nir_validate_shader(nir);
> > > progress |= nir_opt_algebraic(nir);
> > > nir_validate_shader(nir);
> > > -      progress |= nir_opt_peephole_ffma(nir);
> > > -      nir_validate_shader(nir);
> > > progress |= nir_opt_constant_folding(nir);
> > > nir_validate_shader(nir);
> > > progress |= nir_opt_remove_phis(nir);
> > > @@ -131,6 +129,12 @@ fs_visitor::emit_nir_code()
> > > 
> > > nir_optimize(nir);
> > > 
> > > +   if (brw->gen >= 6) {
> > > +      /* Try and fuse multiply-adds */
> > > +      nir_opt_peephole_ffma(nir);
> > > +      nir_validate_shader(nir);
> > > +   }
> > > +
> > > nir_opt_algebraic_late(nir);
> > > nir_validate_shader(nir);
> > > 
> > > @@ -141,6 +145,8 @@ fs_visitor::emit_nir_code()
> > > nir_validate_shader(nir);
> > > nir_copy_prop(nir);
> > > nir_validate_shader(nir);
> > > +   nir_opt_dce(nir);
> > > +   nir_validate_shader(nir);
> > 
> > Just to confirm: this did help something?
> 
> We should always run dead code after copy prop and we weren't.
> However, that should probably be its own patch.  I'll split it out if
> you'd like.

Seems like the right thing to do.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic