[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-bugs
Subject:    [Bug tree-optimization/113104] Suboptimal loop-based slp node splicing across iterations
From:       "rsandifo at gcc dot gnu.org via Gcc-bugs" <gcc-bugs () gcc ! gnu ! org>
Date:       2023-12-30 12:35:49
Message-ID: bug-113104-4-Ky6KpQU5nY () http ! gcc ! gnu ! org/bugzilla/
[Download RAW message or body]

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113104

Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-12-30
     Ever confirmed|0                           |1
                 CC|                            |rsandifo at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |rsandifo at gcc dot gnu.org

--- Comment #4 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
FWIW, we do get the desired code with -march=armv8-a+sve (even though the test
doesn't use SVE).  This is because of:

  /* Consider enabling VECT_COMPARE_COSTS for SVE, both so that we
     can compare SVE against Advanced SIMD and so that we can compare
     multiple SVE vectorization approaches against each other.  There's
     not really any point doing this for Advanced SIMD only, since the
     first mode that works should always be the best.  */
  if (TARGET_SVE && aarch64_sve_compare_costs)
    flags |= VECT_COMPARE_COSTS;

The testcase in this PR is a counterexample to the claim in the final sentence.
 I think the comment might predate significant support for mixed-sized Advanced
SIMD vectorisation.

If we enable SVE (or uncomment the "if" line), the costs are 13 units per
vector iteration for 128-bit vectors and 4 units per vector iteration for
64-bit vectors (so 8 units per 128 bits on a parity basis).  The 64-bit version
is therefore seen as significantly cheaper and is chosen ahead of the 128-bit
version.

I think this PR is enough proof that we should enable VECT_COMPARE_COSTS even
without SVE.  Assigning to myself for that.=
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic