[prev in list] [next in list] [prev in thread] [next in thread] 

List:       llvm-bugs
Subject:    [llvm-bugs] [Bug 61047] [SLP][AArch64] Over-eager SLP vectorisation
From:       LLVM Bugs via llvm-bugs <llvm-bugs () lists ! llvm ! org>
Date:       2023-02-28 12:34:18
Message-ID: 20230228123418.3656683b54da58b0 () email ! llvm ! org
[Download RAW message or body]

[Attachment #2 (text/html)]

<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/61047>61047</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [SLP][AArch64] Over-eager SLP vectorisation
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sjoerdmeijer
      </td>
    </tr>
</table>

<pre>
    I am opening this issue to discuss possible approaches as the problem seem to \
have been identified already. I.e., my motivating case is very similar to the \
pre-committed test case in 3c5e24a51ce072fd0083396dcf0ea107b1858d11

Taking the very first test case, SLP vectorisation is indeed not profitable, which \
you could probably guess by just eyeballing the codegen: 

https://godbolt.org/z/E64TMKsx9

But there are actually quite a few things going on here, I don&apos;t think it is \
only related to generating fmas or not. To illustrate this, here are the timeline \
view from MCA:

 ```Timeline view:
                      0123456789          01
  Index 0123456789          0123456789
  [0,0]     DeeER.    .    .    .    . ..   mov        x8, x1
  [0,1]     DeeeeeeER .    .    .    .    ..   ldr s0, [x0]
  [0,2]     D==eeeeeeeeER  .    .    .    ..   ld1r { v1.2s }, [x8], #4
  [0,3]     D==========eeeeeeER .    .    ..   ldr d2, [x8]
  [0,4]     D================eeeER   .    ..   fmul v1.2s, v1.2s, v2.2s
  [0,5]     D================eeeER   .    ..   fmul v0.2s, v2.2s, v0.s[0]
  [0,6]     D===================eeER .    .. rev64      v1.2s, v1.2s
  [0,7]     D=====================eeER    .. fsub       v3.2s, v0.2s, v1.2s
  [0,8]     .D====================eeER ..   fadd       v0.2s, v0.2s, v1.2s
  [0,9] .D======================eeER  ..   mov        v3.s[1], v0.s[1]
  [0,10] .D========================eeER..   str        d3, [x0]
  [0,11] .D========================eeeeER   st1        { v2.s }[1], [x1]
```

VS:

  ```Timeline view:
 01234567
  Index     0123456789
  [0,0]     DeeeeeeER .    . .   ldp s0, s2, [x1]
  [0,1]     DeeeeeeER .    . .   ldr      s1, [x1, #8]
 [0,2]     DeeeeeeER .    . .   ldr      s4, [x0]
  [0,3]     D======eeeER . .   fmul     s3, s1, s0
  [0,4]     D======eeeER   . .   fmul     s0, s0, s2
  [0,5]     D=========eeeeER .   fnmsub   s2, s2, s4, s3
  [0,6] D=========eeeeER .   fmadd    s0, s1, s4, s0
  [0,7] D=============eeER   stp      s2, s0, [x0]
  [0,8]     .D============eeER str      s1, [x1]
```

So to me this looks like a combination of problems:
- we emit higher latency instructions, e.g. `LD1R` and `ST1`
- We emit more instruction, e.g. `REV` and `MOV` for the shuffle and extract.

More instructions doesn&apos;t need to mean slower, but in this case it is creating \
this dependency chains or critical paths, and there is not enough parallelism that \
this is profitable. 

To me, it looks like a lot of cost-modelling of insertelement, shuffle vector, and \
extractelement gone wrong here so am going to look into that. But if you have other \
ideas about this @fhahn or @davemgreen then I am open to suggestions of course.



</pre>
<img width="1px" height="1px" alt="" \
src="http://email.email.llvm.org/o/eJysV11zo7gS_TX4pSsUH8YmD35IJpOqqTtTu5Wk9j4L1IASIfl \
Kwonvr99qCWzsfGx2ZlwOxo50zumWdOhm1opWIW6i4joqbhZscJ02G_uo0fAexSOaRaX5fvMNWA96i0qoFlwnL \
AhrBwSngQtbD9bCVlsrKonAtlujWd2hBWbBdQhboyuJPVjEnqZ0bIdQISoQHJUTjUAOTBpkfB_DtxjjKPsC_R5 \
67cSOOSKtmUUQFnZo9mBFLyQzhBXw8aLWfS-cQw4OrRuHK8jrArMlK9Iak3XW8CQp8_xyxesmQZYm6yoti5Kna \
ZTcRMlVuD6wpxAmBrZGGOuOsKTt_vufsMPaaSMsc0IrkiYUR-SgtKOIG-FYJf3o507UHez1ALUeJPf5YJXcQzu \
gtVDt4XGwDnCPFZNy4q41xxZVlF_BXF3n3NZG-VWU3UbZbat5paWLtWmj7Pb_UXb7dbV8-PEf-3I5n3Q9OMI0C \
Iz-ajcwKffwv0E4BAYNPtOqqtZCq4lfK6DRJP4bcK2ibO38iCcQjmLVSu7BoGQ-4xpaVGjCSjU9s6AN5SGGBw1 \
CysE6wxz6nUOYByUUpxM9SqEQdgKfoTG6hx9frijCWQAQrZLwfpgPP4yCN19JmuXLYrUuL-e_TTO-KY4v74yZf \
pvGRsV1EmVfkqi48SNuEL_exXT36hLTZ693E9hLSSG_pGdQ6QyKXl_vXkPRha6SG7A0iSa_kIgzsOwAFuX0Rjy \
AvgOYGojW17BL48xCtL6ZwEsCp_ssX56R5Gck773fimeKgmcnRKcEy08SvElJoc7Imn6QIToiPN5kdHNKW_x22 \
uSEjW6S2Hq2VzGvfpp8puKQ6xgM7lbLsPHOoj_lXf8674w9cDd2qMZNv8uPkb8vopxExL8uIaSfcT4pSD6j4JI \
U_Br7PA3nZ3-X-3VPx0M1boP09TZIk98l5CjHi7HOTGJ4_oGDpOnvFjBuDOvSSYA3nCwOfnPMCik6puRg9HP7_ \
-v-_Hnwjw-EycNP3f7T7n5qYcG-tqMJ2-y17E_4-uSB_mXTI0aw25khvnL1j6GWH6zrR6Y9QR6My6P5XRLk2eR \
fOPTRDE_xkhEpJO4nrHcWeqP64DFhCcarj9_mbznrZ3H70ThGnekM9zwH63_CfcMdrduOa5Uds_HOiv1LU_QMh \
1M-31Yfnqh7TWVbH2oykFo_WZDiiarBWveVUKGs1c1UvdvD0bqAZwTshYNOtB0aoCpQ1XsQyjoz1DTRGy7GbUz \
n9PtNehetEmCK09f7h_Qg5gL-O2L12uAcYQ5w9_Wv2fwff_hvjTa-frTd0DTUdygO-OIMq108j_THGbAFrtGGk \
lZhKF97ZAqs1M9oiLcaHDUPPjWhk_BFb20wlLj-Hxy3qLgPvO6YUL7orY1womYStsx1PgkkK5TewvreAJUe2g6 \
2zDApUQrbg-uYm9qqWe8QnxT-DySTEIU7XS-pHa1Tra276DXH0ELohoJG41Bij8r5fTemKrQuk7oxaeM4aLVCe \
DZataFOt5qav9AXOO2pQSjfejEXA7UWovHdje_rNAVLjR2zwCo9jIFFy6TpWKcoSdEy4WyHfWuoB3QdKjg0mER \
hh7ZFG9bKBzYYiydrGq4Lvsn5ZX7JFrhJV-t1Ua5XRbnoNhmvV8uy4Mklr5bVmlXY1AVf1yVmdcmLaiE2WZLlS \
ZaV9CxIyzhfFatVmVfFkrOirJJomWDPhIyl3PXUXS18w7tZpclyvZCsQml915xlCp9DNxxlZNgLs6E5F9XQ2mi \
ZSGGdPaI44aRvt--__0kHtLi-ujJ1t_LG-scOzQWyFs3rBnMxGLk56_2E64YqrnUfZbfEMH5cbI1-xNpF2a3XZaPs1uv-OwAA__-g0kTK">



[Attachment #3 (text/plain)]

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic