[prev in list] [next in list] [prev in thread] [next in thread]
List: mesa3d-dev
Subject: Re: [Mesa-dev] [PATCH 4/9] nir: Move the compare-with-zero optimizations to the late section
From: Matt Turner <mattst88 () gmail ! com>
Date: 2015-03-31 18:04:26
Message-ID: CAEdQ38FQKUEZQWufqqPR5HDAQwuY8r9rb5Y_Am-ASKopcb1oTA () mail ! gmail ! com
[Download RAW message or body]
On Mon, Mar 23, 2015 at 8:43 PM, Jason Ekstrand <jason@jlekstrand.net> wrote:
> On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88@gmail.com> wrote:
>> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason@jlekstrand.net> wrote:
>>> total instructions in shared programs: 4422307 -> 4422363 (0.00%)
>>> instructions in affected programs: 4230 -> 4286 (1.32%)
>>> helped: 0
>>> HURT: 12
>>>
>>> While this does hurt some things, the losses are minor and it prevents the
>>> compare-with-zero optimization from fighting with ffma which is much more
>>> important.
>>
>> Is it actually "fighting" (i.e., undoing the other pass' work) or just
>> preventing some ffmas from being generated?
>>
>> If we did have something that would be recognized by both these and
>> the ffma pattern, it'd look like
>>
>> fge(fadd(a, fmul(b, c)), 0.0)
>>
>> which we could turn into
>>
>> fge(ffma(a, b, c), 0.0) if ffma runs first; or
>> fge(a, fneg(fmul(b, c)) otherwise
>>
>> I guess the first one is better for i965, since we can do that in one
>> instruction. In fact, maybe we don't want to do these optimizations at
>> all? I'm kind of surprised that it hurts.
>
> Right. In one sense it doesn't help anything because we can do a
> compare with zero for free in i965. However, losing it does hurt
> quite a bit in the case where the optimization allows us to remove the
> add instruction. The problem is when the add is part of a potential
> ffma in which case pulling things into the comparison keeps the more
> optimized ffma peephole from actually converting to an ffma. In this
> case we keep both the add and the multiply even though we could have
> done it with a ffma and a compare with zero.
So to confirm, in the case of
> (('flt', ('fadd', a, b), 0.0), ('flt', a, ('fneg', b))),
you want to keep the a+b around so that if a or b is a multiplication,
the ffma peephole can recognize it?
If that's the case,
Reviewed-by: Matt Turner <mattst88@gmail.com>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic