[prev in list] [next in list] [prev in thread] [next in thread]
List: llvm-bugs
Subject: [llvm-bugs] [Bug 61047] [SLP][AArch64] Over-eager SLP vectorisation
From: LLVM Bugs via llvm-bugs <llvm-bugs () lists ! llvm ! org>
Date: 2023-02-28 12:34:18
Message-ID: 20230228123418.3656683b54da58b0 () email ! llvm ! org
[Download RAW message or body]
[Attachment #2 (text/html)]
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/61047>61047</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SLP][AArch64] Over-eager SLP vectorisation
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sjoerdmeijer
</td>
</tr>
</table>
<pre>
I am opening this issue to discuss possible approaches as the problem seem to \
have been identified already. I.e., my motivating case is very similar to the \
pre-committed test case in 3c5e24a51ce072fd0083396dcf0ea107b1858d11
Taking the very first test case, SLP vectorisation is indeed not profitable, which \
you could probably guess by just eyeballing the codegen:
https://godbolt.org/z/E64TMKsx9
But there are actually quite a few things going on here, I don't think it is \
only related to generating fmas or not. To illustrate this, here are the timeline \
view from MCA:
```Timeline view:
0123456789 01
Index 0123456789 0123456789
[0,0] DeeER. . . . . .. mov x8, x1
[0,1] DeeeeeeER . . . . .. ldr s0, [x0]
[0,2] D==eeeeeeeeER . . . .. ld1r { v1.2s }, [x8], #4
[0,3] D==========eeeeeeER . . .. ldr d2, [x8]
[0,4] D================eeeER . .. fmul v1.2s, v1.2s, v2.2s
[0,5] D================eeeER . .. fmul v0.2s, v2.2s, v0.s[0]
[0,6] D===================eeER . .. rev64 v1.2s, v1.2s
[0,7] D=====================eeER .. fsub v3.2s, v0.2s, v1.2s
[0,8] .D====================eeER .. fadd v0.2s, v0.2s, v1.2s
[0,9] .D======================eeER .. mov v3.s[1], v0.s[1]
[0,10] .D========================eeER.. str d3, [x0]
[0,11] .D========================eeeeER st1 { v2.s }[1], [x1]
```
VS:
```Timeline view:
01234567
Index 0123456789
[0,0] DeeeeeeER . . . ldp s0, s2, [x1]
[0,1] DeeeeeeER . . . ldr s1, [x1, #8]
[0,2] DeeeeeeER . . . ldr s4, [x0]
[0,3] D======eeeER . . fmul s3, s1, s0
[0,4] D======eeeER . . fmul s0, s0, s2
[0,5] D=========eeeeER . fnmsub s2, s2, s4, s3
[0,6] D=========eeeeER . fmadd s0, s1, s4, s0
[0,7] D=============eeER stp s2, s0, [x0]
[0,8] .D============eeER str s1, [x1]
```
So to me this looks like a combination of problems:
- we emit higher latency instructions, e.g. `LD1R` and `ST1`
- We emit more instruction, e.g. `REV` and `MOV` for the shuffle and extract.
More instructions doesn't need to mean slower, but in this case it is creating \
this dependency chains or critical paths, and there is not enough parallelism that \
this is profitable.
To me, it looks like a lot of cost-modelling of insertelement, shuffle vector, and \
extractelement gone wrong here so am going to look into that. But if you have other \
ideas about this @fhahn or @davemgreen then I am open to suggestions of course.
</pre>
<img width="1px" height="1px" alt="" \
src="http://email.email.llvm.org/o/eJysV11zo7gS_TX4pSsUH8YmD35IJpOqqTtTu5Wk9j4L1IASIfl \
Kwonvr99qCWzsfGx2ZlwOxo50zumWdOhm1opWIW6i4joqbhZscJ02G_uo0fAexSOaRaX5fvMNWA96i0qoFlwnL \
AhrBwSngQtbD9bCVlsrKonAtlujWd2hBWbBdQhboyuJPVjEnqZ0bIdQISoQHJUTjUAOTBpkfB_DtxjjKPsC_R5 \
67cSOOSKtmUUQFnZo9mBFLyQzhBXw8aLWfS-cQw4OrRuHK8jrArMlK9Iak3XW8CQp8_xyxesmQZYm6yoti5Kna \
ZTcRMlVuD6wpxAmBrZGGOuOsKTt_vufsMPaaSMsc0IrkiYUR-SgtKOIG-FYJf3o507UHez1ALUeJPf5YJXcQzu \
gtVDt4XGwDnCPFZNy4q41xxZVlF_BXF3n3NZG-VWU3UbZbat5paWLtWmj7Pb_UXb7dbV8-PEf-3I5n3Q9OMI0C \
Iz-ajcwKffwv0E4BAYNPtOqqtZCq4lfK6DRJP4bcK2ibO38iCcQjmLVSu7BoGQ-4xpaVGjCSjU9s6AN5SGGBw1 \
CysE6wxz6nUOYByUUpxM9SqEQdgKfoTG6hx9frijCWQAQrZLwfpgPP4yCN19JmuXLYrUuL-e_TTO-KY4v74yZf \
pvGRsV1EmVfkqi48SNuEL_exXT36hLTZ693E9hLSSG_pGdQ6QyKXl_vXkPRha6SG7A0iSa_kIgzsOwAFuX0Rjy \
AvgOYGojW17BL48xCtL6ZwEsCp_ssX56R5Gck773fimeKgmcnRKcEy08SvElJoc7Imn6QIToiPN5kdHNKW_x22 \
uSEjW6S2Hq2VzGvfpp8puKQ6xgM7lbLsPHOoj_lXf8674w9cDd2qMZNv8uPkb8vopxExL8uIaSfcT4pSD6j4JI \
U_Br7PA3nZ3-X-3VPx0M1boP09TZIk98l5CjHi7HOTGJ4_oGDpOnvFjBuDOvSSYA3nCwOfnPMCik6puRg9HP7_ \
-v-_Hnwjw-EycNP3f7T7n5qYcG-tqMJ2-y17E_4-uSB_mXTI0aw25khvnL1j6GWH6zrR6Y9QR6My6P5XRLk2eR \
fOPTRDE_xkhEpJO4nrHcWeqP64DFhCcarj9_mbznrZ3H70ThGnekM9zwH63_CfcMdrduOa5Uds_HOiv1LU_QMh \
1M-31Yfnqh7TWVbH2oykFo_WZDiiarBWveVUKGs1c1UvdvD0bqAZwTshYNOtB0aoCpQ1XsQyjoz1DTRGy7GbUz \
n9PtNehetEmCK09f7h_Qg5gL-O2L12uAcYQ5w9_Wv2fwff_hvjTa-frTd0DTUdygO-OIMq108j_THGbAFrtGGk \
lZhKF97ZAqs1M9oiLcaHDUPPjWhk_BFb20wlLj-Hxy3qLgPvO6YUL7orY1womYStsx1PgkkK5TewvreAJUe2g6 \
2zDApUQrbg-uYm9qqWe8QnxT-DySTEIU7XS-pHa1Tra276DXH0ELohoJG41Bij8r5fTemKrQuk7oxaeM4aLVCe \
DZataFOt5qav9AXOO2pQSjfejEXA7UWovHdje_rNAVLjR2zwCo9jIFFy6TpWKcoSdEy4WyHfWuoB3QdKjg0mER \
hh7ZFG9bKBzYYiydrGq4Lvsn5ZX7JFrhJV-t1Ua5XRbnoNhmvV8uy4Mklr5bVmlXY1AVf1yVmdcmLaiE2WZLlS \
ZaV9CxIyzhfFatVmVfFkrOirJJomWDPhIyl3PXUXS18w7tZpclyvZCsQml915xlCp9DNxxlZNgLs6E5F9XQ2mi \
ZSGGdPaI44aRvt--__0kHtLi-ujJ1t_LG-scOzQWyFs3rBnMxGLk56_2E64YqrnUfZbfEMH5cbI1-xNpF2a3XZaPs1uv-OwAA__-g0kTK">
[Attachment #3 (text/plain)]
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic