[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing
From:       Richard Biener <richard.guenther () gmail ! com>
Date:       2016-10-19 10:07:13
Message-ID: CAFiYyc1O9maYsTWHz0Rn2p2UEgFiiTKn-kCuMWafovwrgt9VyA () mail ! gmail ! com
[Download RAW message or body]

On Tue, Oct 18, 2016 at 9:52 PM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
> 
> On Mon, 17 Oct 2016 15:38:50 +0200, I wrote:
> > On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener <richard.guenther@gmail.com> \
> > wrote:
> > > On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
> > > <thomas@codesourcery.com> wrote:
> > > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener \
> > > > <richard.guenther@gmail.com> wrote:
> > > > > On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> > > > > <thomas@codesourcery.com> wrote:
> > > > > > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener \
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > > On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nathan@acm.org> wrote:
> > > > > > > > On 10/14/16 05:28, Richard Biener wrote:
> > > > > > > > 
> > > > > > > > > The BB_VISITED flag has indetermined state at the beginning of a \
> > > > > > > > > pass. You have to ensure it is cleared yourself.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > In that case the openacc (&nvptx?) passes should be modified to clear \
> > > > > > > > the flags at their start, rather than at their end.
> 
> This already is a "conceptual acknowledgement" of my patch, so...
> 
> > > > > > OK to commit the following?  Is such a test case appropriate (which would
> > > > > > have caught this issue right away), in particular the dg-final
> > > > > > scan-tree-dump line?
> > > > > 
> > > > > Ugh.  Not worse to what we do in various dwarf scanning I guess.
> > > > 
> > > > ;-|
> > > > 
> > > > > Doesn't failure lead to a miscompile eventually?  So you could formulate
> > > > > this as a dg-do run test with a check for the desired outcome?
> > > > 
> > > > No, unfortunately.  In this case the error is "benign" such that the
> > > > OpenACC loop processing machinery will decide to not parallelize loops
> > > > that ought to be parallelized.
> > > 
> > > So you can scan for "loop parallelized" instead?
> > 
> > The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
> > have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
> > seems likewise fragile (for false negatives), and less useful than
> > scanning for the complete pattern.
> > 
> > > I fear your pattern
> > > is quite fragile
> > > to maintain over time.
> > 
> > Agreed -- but then, that's intentional: my idea for this new test case
> > has been to have it actually verify the expected OpenACC loop processing,
> > so it's clear that this pattern will need to be adjusted if changing the
> > OpenACC loop processing.
> > 
> > > > This won't generally cause any problem
> > > > (apart from performance regression, obviously); it just caused problems
> > > > in a few libgomp test cases that actually at run time test for
> > > > parallelized execution -- which will/did trigger only with nvptx
> > > > offloading enabled, which not too many people are testing.  The test case
> > > > I propose below will trigger also for non-offloading configurations.
> > 
> > On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
> > backslashes]' -- thanks, done.
> 
> If you don't like the test case as-is (do we need multi-line tree dump
> scanning, just like we recently got for compiler diagnostics?), can I at
> least commit the OpenACC loops processing fix?  Here is the latest
> version, simplified after your r241296 IRA vs. BB_VISITED fixes:

Sure, I considered that approved already (it's even obvious).

Richard.

> commit 766cf9959b15a17e17e89a50e905b4c546893823
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Mon Oct 17 15:33:09 2016 +0200
> 
> Clear basic block flags before using BB_VISITED for OpenACC loops processing
> 
> gcc/
> * omp-low.c (oacc_loop_discovery): Call clear_bb_flags before, and
> don't clear BB_VISITED after processing.
> 
> gcc/testsuite/
> * gcc.dg/goacc/loop-processing-1.c: New file.
> ---
> gcc/omp-low.c                                  |  8 +++-----
> gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++
> 2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git gcc/omp-low.c gcc/omp-low.c
> index 77f89d5..3ef796f 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -19236,7 +19236,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
> static oacc_loop *
> oacc_loop_discovery ()
> {
> -  basic_block bb;
> +  /* Clear basic block flags, in particular BB_VISITED which we're going to use
> +     in the following.  */
> +  clear_bb_flags ();
> 
> oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> @@ -19245,10 +19247,6 @@ oacc_loop_discovery ()
> that diagnostics come out in an unsurprising order.  */
> top = oacc_loop_sibling_nreverse (top);
> 
> -  /* Reset the visited flags.  */
> -  FOR_ALL_BB_FN (bb, cfun)
> -    bb->flags &= ~BB_VISITED;
> -
> return top;
> }
> 
> diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c \
> gcc/testsuite/gcc.dg/goacc/loop-processing-1.c new file mode 100644
> index 0000000..619576a
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> @@ -0,0 +1,18 @@
> +/* Make sure that OpenACC loop processing happens.  */
> +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> +
> +extern int place ();
> +
> +void vector_1 (int *ary, int size)
> +{
> +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) \
> firstprivate (size) +  {
> +#pragma acc loop gang
> +    for (int jx = 0; jx < 1; jx++)
> +#pragma acc loop auto
> +      for (int ix = 0; ix < size; ix++)
> +       ary[ix] = place ();
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop \
> 14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, \
> 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, \
> 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop \
> 6\(4\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 6\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, \
> 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, \
> 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);} \
> "oaccdevlow" } } */ 
> 
> Grüße
> Thomas


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic