[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesa3d-dev
Subject:    Re: [Mesa-dev] [PATCH 4/9] nir: Move the compare-with-zero optimizations to the late section
From:       Matt Turner <mattst88 () gmail ! com>
Date:       2015-03-31 18:04:26
Message-ID: CAEdQ38FQKUEZQWufqqPR5HDAQwuY8r9rb5Y_Am-ASKopcb1oTA () mail ! gmail ! com
[Download RAW message or body]

On Mon, Mar 23, 2015 at 8:43 PM, Jason Ekstrand <jason@jlekstrand.net> wrote:
> On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88@gmail.com> wrote:
>> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason@jlekstrand.net> wrote:
>>> total instructions in shared programs: 4422307 -> 4422363 (0.00%)
>>> instructions in affected programs:     4230 -> 4286 (1.32%)
>>> helped:                                0
>>> HURT:                                  12
>>>
>>> While this does hurt some things, the losses are minor and it prevents the
>>> compare-with-zero optimization from fighting with ffma which is much more
>>> important.
>>
>> Is it actually "fighting" (i.e., undoing the other pass' work) or just
>> preventing some ffmas from being generated?
>>
>> If we did have something that would be recognized by both these and
>> the ffma pattern, it'd look like
>>
>> fge(fadd(a, fmul(b, c)), 0.0)
>>
>> which we could turn into
>>
>> fge(ffma(a, b, c), 0.0) if ffma runs first; or
>> fge(a, fneg(fmul(b, c)) otherwise
>>
>> I guess the first one is better for i965, since we can do that in one
>> instruction. In fact, maybe we don't want to do these optimizations at
>> all? I'm kind of surprised that it hurts.
>
> Right.  In one sense it doesn't help anything because we can do a
> compare with zero for free in i965.  However, losing it does hurt
> quite a bit in the case where the optimization allows us to remove the
> add instruction.  The problem is when the add is part of a potential
> ffma in which case pulling things into the comparison keeps the more
> optimized ffma peephole from actually converting to an ffma.  In this
> case we keep both the add and the multiply even though we could have
> done it with a ffma and a compare with zero.

So to confirm, in the case of

> (('flt', ('fadd', a, b), 0.0), ('flt', a, ('fneg', b))),

you want to keep the a+b around so that if a or b is a multiplication,
the ffma peephole can recognize it?

If that's the case,

Reviewed-by: Matt Turner <mattst88@gmail.com>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic