[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-patches
Subject:    Re: [PATCH] Fix PR/8268: implement compile time array subscript checking
From:       Dirk Mueller <dmuell () gmx ! net>
Date:       2006-12-21 21:54:03
Message-ID: 200612212254.03585.dmuell () gmx ! net
[Download RAW message or body]

On Tuesday, 12. December 2006 13:51, Richard Guenther wrote:

> > Ok? Do we need the extra -Warray-bounds?
> Yes, we need the extra -Warray-bounds for languages that don't include it
> in -Wall and to turn the warning off.

I don't really think it is a good idea to turn off the warning ;) The main 
reason I'm asking is because we'd need an extra flag for "maybe it overflows 
the array here" kind of warnings, as in my case I'd imagine that I want to 
Werror-out on the "always overflows" and only warn on the "maybe overflows" 
case. 

> > NULL_TREE) +      /* Accesses after the end of arrays of size 0 (gcc
> > +         extension) and 1 are likely intentional ("struct
> > +         hack").  */
> > +      || compare_tree_int (ub, 1) <= 0)
> > +    return;
> The "struct hack" needs to include all arrays that are the last member
> in a structure.
> See tree-dfa.c:get_ref_base_and_extent - you can feed it the array_ref
> and check if
> *pmax_size is -1 (unbound or unknown).  This function should also deal with
> flexible arrays (i.e. report them as unknown extent).

Hmm, it does that when it can determine that the array subscript is accessing 
the array beyond a memory location that is inside the struct the array ref 
was declared. It doesn't make sense to disable an array overflow warning for 
the case that it overflows an array more than the size of the struct, does 
it? There might be many legal cases suppressed this way. Do you know any real 
life code that would trigger false positives here? Or do you only want the 
struct hack to be enabled if the array is at the end of a structure?

> > +  ru = TREE_CODE (us) == INTEGER_CST ? tree_int_cst_compare (ub, us) :
> > -2; +  rl = TREE_CODE (ls) == INTEGER_CST ? tree_int_cst_compare (ls, lb)
> > : -2;
> It looks like you can avoid adding 1 to ub by adjusting the checks
> below to ru <= 0.
> > +  else if (ru == -1 || (!ignore_off_by_one && ru == 0))
> > +    warning (OPT_Warray_bounds, "%Harray subscript is above array

No, because of the case quoted above. If the ARRAY_REF is inside an ADDR_EXPR, 
taking the address of the first element after the end of the array is 
welldefined. 

if you're saying that int_const_binop is computationally more expensive than 
two tree_int_cst_compare calls.. then I see your point. Ok, I've done some 
profiling and the change below seems to be in your spirit, and is about 
factor 2.5 faster. The total runtime overhead for insn-recog.c, which has more 
than 30000 array subscripts, is now about 0,1%. Much of the speedup however 
comes from compare_tree_int() which I had to change to be cheaper. 

Bootstrapped, regtested against trunk on i686-suse-linux. 

Thanks,
Dirk

2006-12-21  Dirk Mueller  <dmueller@suse.de>
	    Richard Guenther <rguenther@suse.de>

	PR diagnostic/8268
	* doc/invoke.texi (Warray-bounds): Document -Warray-bounds.
	* common.opt (Warray-bounds): Add new warning option.
	* c-opts.c (c_common_handle_option): Define -Warray-bounds
	if -Wall is given.
	* tree-vrp.c (vrp_finalize): Call check_array_refs if -Warray-bounds
	is enabled.
	* tree.c (simple_cst_equal): Manually inline tree_int_cst_sgn.
	(check_array_refs, check_array_bounds, check_array_ref): New.

	* testsuite/gcc.dg/Warray-bounds.c: New testcase.


--- doc/invoke.texi	(revision 119972)
+++ doc/invoke.texi	(working copy)
@@ -245,7 +245,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wreturn-type  -Wsequence-point  -Wshadow @gol
 -Wsign-compare  -Wstack-protector @gol
 -Wstrict-aliasing -Wstrict-aliasing=2 @gol
--Wstring-literal-comparison @gol
+-Wstring-literal-comparison -Warray-bounds @gol
 -Wswitch  -Wswitch-default  -Wswitch-enum @gol
 -Wsystem-headers  -Wtrigraphs  -Wundef  -Wuninitialized @gol
 -Wunknown-pragmas  -Wno-pragmas -Wunreachable-code @gol
@@ -2831,6 +2831,12 @@ compiler is using for optimization.  Thi
 @option{-Wstrict-aliasing}, but it will also give a warning for some 
ambiguous
 cases that are safe.
 
+@item -Warray-bounds
+@opindex Warray-bounds
+This option is only active when @option{-ftree-vrp} is active
+(default for -O2 and above). It warns about subscripts to arrays
+that are always out of bounds.
+
 @item -Wall
 @opindex Wall
 All of the above @samp{-W} options combined.  This enables all the
--- common.opt	(revision 119972)
+++ common.opt	(working copy)
@@ -61,6 +61,10 @@ Walways-true
 Common Var(warn_always_true)
 Warn about comparisons that always evaluate to true
 
+Warray-bounds
+Common Var(warn_array_bounds)
+Warn if an array is accessed out of bounds
+
 Wattributes
 Common Var(warn_attributes) Init(1)
 Warn about inappropriate attribute usage
--- tree.c	(revision 119972)
+++ tree.c	(working copy)
@@ -5003,7 +5003,7 @@ simple_cst_equal (tree t1, tree t2)
 int
 compare_tree_int (tree t, unsigned HOST_WIDE_INT u)
 {
-  if (tree_int_cst_sgn (t) < 0)
+  if (!TYPE_UNSIGNED (TREE_TYPE (t)) && TREE_INT_CST_HIGH (t) < 0)
     return -1;
   else if (TREE_INT_CST_HIGH (t) != 0)
     return 1;
--- tree-vrp.c	(revision 119972)
+++ tree-vrp.c	(working copy)
@@ -32,6 +32,7 @@ Boston, MA 02110-1301, USA.  */
 #include "tree-dump.h"
 #include "timevar.h"
 #include "diagnostic.h"
+#include "toplev.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-propagate.h"
@@ -3420,6 +3420,142 @@ insert_range_assertions (void)
   BITMAP_FREE (need_assert_for);
 }
 
+/* Checks one ARRAY_REF in REF, located at LOCUS. Ignores flexible arrays
+   and "struct" hacks. If VRP can determine that the
+   array subscript is a contant, check if it is outside valid
+   range. If the array subscript is a RANGE, warn if it is
+   non-overlapping with valid range.
+   IGNORE_OFF_BY_ONE is true if the ARRAY_REF is inside a ADDR_EXPR.  */
+
+static void
+check_array_ref (tree ref, location_t* locus, bool ignore_off_by_one)
+{
+  value_range_t* vr = NULL;
+  tree low_sub, up_sub;
+  tree low_bound, up_bound = array_ref_up_bound(ref);
+
+  low_sub = up_sub = TREE_OPERAND (ref, 1);
+
+  if (!up_bound || !locus || TREE_NO_WARNING (ref)
+      || TREE_CODE (up_bound) != INTEGER_CST
+      /* Can not check flexible arrays.  */
+      || (TYPE_SIZE (TREE_TYPE (ref)) == NULL_TREE
+          && TYPE_DOMAIN (TREE_TYPE (ref)) != NULL_TREE
+          && TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (ref))) == NULL_TREE)
+      /* Accesses after the end of arrays of size 0 (gcc
+         extension) and 1 are likely intentional ("struct
+         hack").  */
+      || compare_tree_int (up_bound, 1) <= 0)
+    return;
+
+  low_bound = array_ref_low_bound(ref);
+
+  if (TREE_CODE (low_sub) == SSA_NAME)
+    {
+      vr = get_value_range (low_sub);
+      if (vr->type == VR_RANGE || vr->type == VR_ANTI_RANGE)
+        {
+          low_sub = vr->type == VR_RANGE ? vr->max : vr->min;
+          up_sub = vr->type == VR_RANGE ? vr->min : vr->max;
+        }
+    }
+
+  TREE_NO_WARNING (ref) = 1;
+
+  if (vr && vr->type == VR_ANTI_RANGE)
+    {
+      if (TREE_CODE (up_sub) == INTEGER_CST
+          && tree_int_cst_lt (up_bound, up_sub)
+          && TREE_CODE (low_sub) == INTEGER_CST
+          && tree_int_cst_lt (low_sub, low_bound))
+        warning (OPT_Warray_bounds,
+                 "%Harray subscript is outside array bounds", locus);
+      else
+        TREE_NO_WARNING (ref) = 0;
+    }
+  else if (TREE_CODE (up_sub) == INTEGER_CST
+           && tree_int_cst_lt (up_bound, up_sub)
+           && !tree_int_cst_equal (up_bound, up_sub)
+           && (!ignore_off_by_one
+               || !tree_int_cst_equal (int_const_binop (PLUS_EXPR,
+                                                        up_bound,
+                                                        integer_one_node,
+                                                        0),
+                                       up_sub)))
+    warning (OPT_Warray_bounds, "%Harray subscript is above array bounds",
+             locus);
+  else if (TREE_CODE (low_sub) == INTEGER_CST
+           && tree_int_cst_lt (low_sub, low_bound))
+    warning (OPT_Warray_bounds, "%Harray subscript is below array bounds",
+             locus);
+  else
+    TREE_NO_WARNING (ref) = 0;
+}
+
+static bool array_bounds_already_done;
+
+/* walk_tree() callback that checks if *TP is
+   an ARRAY_REF inside an ADDR_EXPR (in which an array
+   subscript one outside the valid range is allowed). Call
+   check_array_ref for each ARRAY_REF found. The location is 
+   passed in DATA.  */
+
+static tree
+check_array_bounds (tree *tp, int *walk_subtree, void *data)
+{
+ tree t = *tp;
+
+ *walk_subtree = TRUE;
+
+  if (TREE_CODE (t) == ARRAY_REF)
+    check_array_ref (t, (location_t*) data, false /*ignore_off_by_one*/);
+  else if (TREE_CODE (t) == ADDR_EXPR)
+    {
+       t = TREE_OPERAND (t, 0);
+       while (!array_bounds_already_done && handled_component_p (t))
+         {
+           if (TREE_CODE (t) == ARRAY_REF)
+             check_array_ref (t, (location_t*) data, 
true /*ignore_off_by_one*/);
+           t = TREE_OPERAND (t, 0);
+         }
+       *walk_subtree = FALSE;
+    }
+
+  return NULL_TREE;
+}
+
+/* Walk over all statements of all reachable BBs and call check_array_bounds
+   on them.  */
+
+static void
+check_all_array_refs (void)
+{
+  basic_block bb;
+  block_stmt_iterator si;
+
+  FOR_EACH_BB (bb)
+    {
+      /* Skip bb's that are clearly unreachable.  */
+      if (single_pred_p (bb))
+      {
+	basic_block pred_bb = EDGE_PRED (bb, 0)->src;
+	tree ls = NULL_TREE;
+
+	if (!bsi_end_p (bsi_last (pred_bb)))
+	  ls = bsi_stmt (bsi_last (pred_bb));
+
+	if (ls && TREE_CODE (ls) == COND_EXPR
+	    && ((COND_EXPR_COND (ls) == boolean_false_node
+		 && (EDGE_PRED (bb, 0)->flags & EDGE_TRUE_VALUE))
+		|| (COND_EXPR_COND (ls) == boolean_true_node
+		    && (EDGE_PRED (bb, 0)->flags & EDGE_FALSE_VALUE))))
+	  continue;
+      }
+      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+	walk_tree (bsi_stmt_ptr (si), check_array_bounds,
+		   EXPR_LOCUS (*bsi_stmt_ptr (si)), NULL);
+    }
+}
 
 /* Convert range assertion expressions into the implied copies and
    copy propagate away the copies.  Doing the trivial copy propagation
@@ -4733,6 +4869,19 @@ vrp_finalize (void)
 
   substitute_and_fold (single_val_range, true);
 
+  if (warn_array_bounds)
+      check_all_array_refs();
+
+  /* workaround for PR/26726. The problem here is that -fivopts
+     sometimes shifts offsets so that arrays are accessed apparently
+     out of bounds, but actually are not. therefore we do not warn
+     about ARRAY_REF's inside ADDR_EXPR's anymore after the first run
+     (which is before ivopts).
+   */
+
+  array_bounds_already_done = true;
+
+
   /* We must identify jump threading opportunities before we release
      the datastructures built by VRP.  */
   identify_jump_threads ();
Index: testsuite/gcc.dg/Warray-bounds.c
===================================================================
--- testsuite/gcc.dg/Warray-bounds.c	(revision 0)
+++ testsuite/gcc.dg/Warray-bounds.c	(revision 0)
@@ -0,0 +1,92 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds" } */
+
+int a[10];
+
+static inline int n(void) {
+    __SIZE_TYPE__ strlen(const char *s);
+    return strlen("12345");
+}
+
+void g(int *p);
+void h(int p);
+
+int* f(void) {
+    int b[10];
+    int i;
+    struct {
+       int c[10];
+    } c;
+
+    a[-1] = 0;             /* { dg-warning "array subscript" } */
+    a[ 0] = 0;
+    a[ 1] = 0;
+
+
+    a[ 9] = 0;
+    a[10] = 0;             /* { dg-warning "array subscript" } */
+    a[11] = 0;             /* { dg-warning "array subscript" } */
+    a[2 * n() - 11] = 0;    /* { dg-warning "array subscript" } */
+    a[2 * n() - 10] = 0;
+    a[2 * n() -  1] = 0;
+    a[2 * n() -  0] = 0;    /* { dg-warning "array subscript" } */
+
+    b[-1] = 0;             /* { dg-warning "array subscript" } */
+    b[ 0] = 0;
+    b[ 1] = 0;
+    b[ 9] = 0;
+    b[10] = 0;             /* { dg-warning "array subscript" } */
+    b[11] = 0;             /* { dg-warning "array subscript" } */
+    b[2 * n() - 11] = 0;    /* { dg-warning "array subscript" } */
+    b[2 * n() - 10] = 0;
+    b[2 * n() -  1] = 0;
+    b[2 * n() -  0] = 0;    /* { dg-warning "array subscript" } */
+
+    c.c[-1] = 0;           /* { dg-warning "array subscript" } */
+    c.c[ 0] = 0;
+    c.c[ 1] = 0;
+    c.c[ 9] = 0;
+    c.c[10] = 0;           /* { dg-warning "array subscript" } */
+    c.c[11] = 0;           /* { dg-warning "array subscript" } */
+    c.c[2 * n() - 11] = 0;  /* { dg-warning "array subscript" } */
+    c.c[2 * n() - 10] = 0;
+    c.c[2 * n() -  1] = 0;
+    c.c[2 * n() -  0] = 0;  /* { dg-warning "array subscript" } */
+
+    g(&a[8]);
+    g(&a[9]);
+    g(&a[10]);
+    g(&a[11]);             /* { dg-warning "array subscript" } */
+
+    g(&b[10]);
+    g(&c.c[10]);
+    g(&a[11]);             /* { dg-warning "array subscript" } */
+    g(&b[11]);             /* { dg-warning "array subscript" } */
+    g(&c.c[11]);           /* { dg-warning "array subscript" } */
+
+    g(&a[0]);
+    g(&b[0]);
+    g(&c.c[0]);
+
+    g(&a[-1]);             /* { dg-warning "array subscript" } */
+    g(&b[-1]);             /* { dg-warning "array subscript" } */ 
+    h(sizeof a[-1]);
+    h(sizeof a[10]);
+    h(sizeof b[-1]);
+    h(sizeof b[10]);
+    h(sizeof c.c[-1]);
+    h(sizeof c.c[10]);
+
+    if (10 < 10)
+       a[10] = 0;
+    if (10 < 10)
+       b[10] = 0;
+    if (-1 >= 0)
+       c.c[-1] = 0;
+
+    for (i = 20; i < 30; ++i)
+             a[i] = 1;       /* { dg-warning "array subscript" } */
+
+    return a;
+}
+

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic