[prev in list] [next in list] [prev in thread] [next in thread]
List: gcc
Subject: Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing
From: Richard Biener <richard.guenther () gmail ! com>
Date: 2016-10-19 10:07:13
Message-ID: CAFiYyc1O9maYsTWHz0Rn2p2UEgFiiTKn-kCuMWafovwrgt9VyA () mail ! gmail ! com
[Download RAW message or body]
On Tue, Oct 18, 2016 at 9:52 PM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Mon, 17 Oct 2016 15:38:50 +0200, I wrote:
> > On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener <richard.guenther@gmail.com> \
> > wrote:
> > > On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
> > > <thomas@codesourcery.com> wrote:
> > > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener \
> > > > <richard.guenther@gmail.com> wrote:
> > > > > On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> > > > > <thomas@codesourcery.com> wrote:
> > > > > > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener \
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > > On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nathan@acm.org> wrote:
> > > > > > > > On 10/14/16 05:28, Richard Biener wrote:
> > > > > > > >
> > > > > > > > > The BB_VISITED flag has indetermined state at the beginning of a \
> > > > > > > > > pass. You have to ensure it is cleared yourself.
> > > > > > > >
> > > > > > > >
> > > > > > > > In that case the openacc (&nvptx?) passes should be modified to clear \
> > > > > > > > the flags at their start, rather than at their end.
>
> This already is a "conceptual acknowledgement" of my patch, so...
>
> > > > > > OK to commit the following? Is such a test case appropriate (which would
> > > > > > have caught this issue right away), in particular the dg-final
> > > > > > scan-tree-dump line?
> > > > >
> > > > > Ugh. Not worse to what we do in various dwarf scanning I guess.
> > > >
> > > > ;-|
> > > >
> > > > > Doesn't failure lead to a miscompile eventually? So you could formulate
> > > > > this as a dg-do run test with a check for the desired outcome?
> > > >
> > > > No, unfortunately. In this case the error is "benign" such that the
> > > > OpenACC loop processing machinery will decide to not parallelize loops
> > > > that ought to be parallelized.
> > >
> > > So you can scan for "loop parallelized" instead?
> >
> > The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
> > have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
> > seems likewise fragile (for false negatives), and less useful than
> > scanning for the complete pattern.
> >
> > > I fear your pattern
> > > is quite fragile
> > > to maintain over time.
> >
> > Agreed -- but then, that's intentional: my idea for this new test case
> > has been to have it actually verify the expected OpenACC loop processing,
> > so it's clear that this pattern will need to be adjusted if changing the
> > OpenACC loop processing.
> >
> > > > This won't generally cause any problem
> > > > (apart from performance regression, obviously); it just caused problems
> > > > in a few libgomp test cases that actually at run time test for
> > > > parallelized execution -- which will/did trigger only with nvptx
> > > > offloading enabled, which not too many people are testing. The test case
> > > > I propose below will trigger also for non-offloading configurations.
> >
> > On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
> > backslashes]' -- thanks, done.
>
> If you don't like the test case as-is (do we need multi-line tree dump
> scanning, just like we recently got for compiler diagnostics?), can I at
> least commit the OpenACC loops processing fix? Here is the latest
> version, simplified after your r241296 IRA vs. BB_VISITED fixes:
Sure, I considered that approved already (it's even obvious).
Richard.
> commit 766cf9959b15a17e17e89a50e905b4c546893823
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date: Mon Oct 17 15:33:09 2016 +0200
>
> Clear basic block flags before using BB_VISITED for OpenACC loops processing
>
> gcc/
> * omp-low.c (oacc_loop_discovery): Call clear_bb_flags before, and
> don't clear BB_VISITED after processing.
>
> gcc/testsuite/
> * gcc.dg/goacc/loop-processing-1.c: New file.
> ---
> gcc/omp-low.c | 8 +++-----
> gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git gcc/omp-low.c gcc/omp-low.c
> index 77f89d5..3ef796f 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -19236,7 +19236,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
> static oacc_loop *
> oacc_loop_discovery ()
> {
> - basic_block bb;
> + /* Clear basic block flags, in particular BB_VISITED which we're going to use
> + in the following. */
> + clear_bb_flags ();
>
> oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> @@ -19245,10 +19247,6 @@ oacc_loop_discovery ()
> that diagnostics come out in an unsurprising order. */
> top = oacc_loop_sibling_nreverse (top);
>
> - /* Reset the visited flags. */
> - FOR_ALL_BB_FN (bb, cfun)
> - bb->flags &= ~BB_VISITED;
> -
> return top;
> }
>
> diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c \
> gcc/testsuite/gcc.dg/goacc/loop-processing-1.c new file mode 100644
> index 0000000..619576a
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> @@ -0,0 +1,18 @@
> +/* Make sure that OpenACC loop processing happens. */
> +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> +
> +extern int place ();
> +
> +void vector_1 (int *ary, int size)
> +{
> +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) \
> firstprivate (size) + {
> +#pragma acc loop gang
> + for (int jx = 0; jx < 1; jx++)
> +#pragma acc loop auto
> + for (int ix = 0; ix < size; ix++)
> + ary[ix] = place ();
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop \
> 14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, \
> 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, \
> 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop \
> 6\(4\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, \
> 6\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, \
> 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, \
> 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);} \
> "oaccdevlow" } } */
>
> Grüße
> Thomas
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic