'branch optimizations'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    branch optimizations
From:       Mike Stump <mikestump () comcast ! net>
Date:       2011-04-27 17:14:25
Message-ID: 80433F2F-02E5-4912-B0F0-8F47D76C2513 () comcast ! net
[Download RAW message or body]

So, I have a machine that has many styles of branches, among them, a normal one, and \
a short version.  The short version is cheaper (sometimes).  The regular one is 1 \
(predicted), 7 mis-predicted.  The cost of mis-prediction can be substantially higher \
depending upon what is in the cache.  The short branch version only applies to \
forward 1-n instructions, and it's cost can be more expensive than a regular branch \
but is usually faster.  Another benefit of the short version is, it doesn't pollute \
the branch cache, thus, improving the prediction rates of other branches.  To obtain \
the costs of each style, I need to run some math taking into consideration the \
expected mis-prediction rates, and the counts (predicted or gcov) and the distance \
skipped over.  Also, I determine if the short branch style applies, if it doesn't, I \
just say the regular form is cheaper.

We can see evidence of the lack of completeness in the current code:

  if ((flag_reorder_blocks || flag_reorder_blocks_and_partition)
      /* Don't reorder blocks when optimizing for size because extra jump insns may   \
                
         be created; also barrier may create extra padding.                           \
                
                                                                                      \
                
         More correctly we should have a block reordering mode that tried to          \
                
         minimize the combined size of all the jumps.  This would more or less        \
                
         automatically remove extra jumps, but would also try to use more short       \
  jumps instead of long jumps.  */
      && optimize_function_for_speed_p (cfun))
    {
      reorder_basic_blocks ();

Essentially, I think the optimization could help other ports, if they have a well \
balanced implementation or if they are limited by the branch prediction or if the \
branch cost is non-zero.  Chips that have lots of resources for prediction and/or can \
follow multiple paths aren't likely to see any benefit.

So, here is the patch in progress, should be close, but outstanding questions would \
be, are there any other ports that benefit from it?  If not, I'd probably say it's \
not suitable for the tree in this form.  Are there other ways, better ways of doing \
this?


If you like the concrete, consider this:

	jump equal, L5  predicted equal
	nop
L5:
	nop2
	ret

versus:

	jump not equal, L6  predicted equal
L7:
	nop2
	ret

L6:	nop
	b L7

So, if branches were 10 cycles, conditional branch 12 cycles, T is the number of \
times executed when the condition is true and F, the number of times false, the \
branch costs would be (T+F)*12 in the first case and (T+F)*12 + F*10 in the second \
case.  The later _always_ more expensive, if F is non-zero.  Further, this is true \
for all such non-zero costs, as long as there is a single F, ignoring cache issues.  \
Arguably, we could examine the cost of a branch and always perform this optimization \
if the cost is high enough, even without a short branch form.  That isn't included in \
the patch.

A better patch, would be to have the backend figure out mis-prediction rates, do the \
math and decide based upon all costs alone given all the costs of all the branches.

Thoughts?


["jump.diffs.txt" (jump.diffs.txt)]

Index: bb-reorder.c
===================================================================
--- bb-reorder.c	(revision 1310)
+++ bb-reorder.c	(revision 1311)
@@ -192,6 +192,48 @@ static void fix_edges_for_rarely_execute
 static void fix_crossing_conditional_branches (void);
 static void fix_crossing_unconditional_branches (void);
 
+/* Return true iff we should prefer the less frequent code
+   of basic block BB to be placed immediately after A.  This
+   is useful when the target has a lower cost branch instruction
+   that can avoid the misprediction costs of a normal branch.  */
+static bool
+short_branch_cheaper (const_basic_block bb ATTRIBUTE_UNUSED)
+{
+#ifdef TARGET_SHORT_BRANCH_CHEAPER
+  basic_block pbb, nbb;
+  edge e;
+  edge_iterator ei;
+  bool found = false;
+  
+  /* When the have:
+     A
+     |\
+     | C
+     |/
+     B
+     we check the port to see if TARGET_SHORT_BRANCH_CHEAPER (A) is
+     true.  It can use the frequencies, the misprediction costs and
+     branch distance to decide.  We only handle the simple case of a
+     single precessor to C (bb), a single successor to C (bb).  */
+  if (!single_pred_p (bb))
+    return false;
+  pbb = single_pred (bb);
+  if (!single_succ_p (bb))
+    return false;
+  nbb = single_succ (bb);
+  FOR_EACH_EDGE (e, ei, nbb->preds)
+    if (e->src == pbb)
+      found = true;
+  if (!found)
+    return false;
+
+  if (TARGET_SHORT_BRANCH_CHEAPER (BB_END (pbb)))
+    return true;
+#endif
+
+  return false;
+}
+
 /* Check to see if bb should be pushed into the next round of trace
    collections or not.  Reasons for pushing the block forward are 1).
    If the block is cold, we are doing partitioning, and there will be
@@ -214,7 +256,8 @@ push_to_next_round_p (const_basic_block
 			  || probably_never_executed_bb_p (bb));
 
   if (there_exists_another_round
-      && block_not_hot_enough)
+      && block_not_hot_enough
+      && !short_branch_cheaper (bb))
     return true;
   else
     return false;
@@ -699,7 +742,8 @@ find_traces_1_round (int branch_th, int
 			    & EDGE_CAN_FALLTHRU)
 			&& !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX)
 			&& single_succ (e->dest) == best_edge->dest
-			&& 2 * e->dest->frequency >= EDGE_FREQUENCY (best_edge))
+			&& (2 * e->dest->frequency >= EDGE_FREQUENCY (best_edge)
+			    || short_branch_cheaper (e->dest)))
 		      {
 			best_edge = e;
 			if (dump_file)


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic