'[RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-patches
Subject:    [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
From:       Jeff Law <law () redhat ! com>
Date:       2015-01-31 4:11:55
Message-ID: 54CC560B.8080200 () redhat ! com
[Download RAW message or body]


So I've gone round and round with BZ39726.

The basic idea here is we have an operation on widened operands that 
feeds a narrow comparison.  ie, the two inputs are QImode, extended to 
SImode, subtracted, then we want to conditionally branch on the result 
of sign bit of QImode.

If we could just do the subtraction in the native mode of the operands 
(QImode) and test that result, we eliminate two extensions and the test 
itself can be eliminated as we'll get the QImode sign bit from the 
QImode subtraction.


I first started poking at combine to see if I could exploit the newer 4 
insn combination capability to see enough INSNs to prove we could narrow 
the operations *and* arrange for the resulting insns to be well suited 
for redundant cmp/tst removal.

This was going to require some extensions to the simplification code, 
the code to determine good split points, the when to use 4 insn 
combination heuristics and possibly other stuff.  I was able to see all 
the insns I needed but the RTL I was going to have to detect and rewrite 
was getting pretty fugly.

I changed course and started looking at twiddling the m68k backend.  I 
built a proof of concept pair of splitters that would rewrite the key 4 
insns and it clearly improved the code.  But the result was not well 
suited for removing the redundant tst.  To also arrange for result to be 
suitable for redundant tst/cmp removal, the two proof of concept 
splitters would have to merely be bridge patterns to an even larger 
splitter which would give us access to the conditional branch so that we 
could rewrite the compare and conditional branch too in such a way to 
make it obvious the result of the subtraction was sufficient to remove 
the tst/cmp instruction.

I then wondered if I could use the new match.pd code to describe what I 
wanted at a higher level, possibly getting to simpler code while still 
working on generic/gimple.  Richi speculated that the new match.pd 
approach could be used for type demotions -- I was skeptical, but what 
the hell.


+/* If we are testing a single bit resulting from a binary
+   operation in precision P1 where the operands were widened
+   precision P2 and the tested bit is the sign bit for
+   precision P2.  Rewrite so the binary operation is in
+   precision P2.  */
+(for inner_op (minus plus mult bit_and bit_ior bit_xor)
+  (simplify
+    (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+    (if (GENERIC
+        && exact_log2 (TREE_INT_CST_LOW (@3)) + 1 == TYPE_PRECISION 
(TREE_TYPE (@0))
+         && TREE_TYPE (@0) == TREE_TYPE (@1)
+        && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+     /* This is a bit hackish, but I didn't see a good way to change
+        the type of the result.  Note that changing the result like
+        this implies that we are not operating on gimple.  */
+     (with {type = TREE_TYPE (@0); }
+      (convert (bit_and (inner_op @0 @1) (convert @3)))))))



The exact_log2 (blah) + 1 == TYPE_PRECISION (blah2) stuff ensures that 
we're looking at just the sign bit in the same mode/precision as the two 
operands of the binary operation.

We're changing the type of the entire expression to match the native 
type of the operands.  So we have to reassign "type" to the new type we 
want to use.  This also implies we're working on generic rather than 
gimple.  It's a bit of a hack, but I couldn't convince the code to use a 
different type for the final result.

I guess it could be made to work on gimple as well by expanding it to 
include the conditional branch this pattern feeds -- that way we could 
avoid the type mismatch.

With that match to match.pd, I get the desired code on the m68k for the 
test in the testcase as well as several of my own.  I've also verified 
this makes _no_ difference to the resulting runtime libraries for x86 
and x86_64.

I'll spin a m68k bootstrap over the weekend, but wanted to get some 
feedback on the match.pd pattern and perhaps a cleaner way to deal with 
changing the resultant type.
Thoughts, comments?
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic