[prev in list] [next in list] [prev in thread] [next in thread]
List: oss-security
Subject: [oss-security] MOV{H,L}PS instructions can fail on Genoa (Zen 4)
From: Tavis Ormandy <taviso () gmail ! com>
Date: 2023-09-21 14:21:21
Message-ID: ZQxRYY0HLhGyn4jf () thinkstation ! cmpxchg8b ! net
[Download RAW message or body]
Hey, when fuzzing Genoa (AMD Zen 4) I noticed that sometimes the
MOV{H,L}PS instructions don't seem to work? I asked AMD if they consider
this a vulnerability, and they didn't.. so I'll just document it here
for reference...
Quick background, these instructions load two 32-bit packed singles from the
source operand into the low (movlps) or high (movhps) 64-bits of a vector
register.
Consider this minimal example:
section .data
a: dq 0x1111111111111111
b: dq 0x2222222222222222
section .text
movhps xmm0, [rel a]
movlps xmm0, [rel b]
The result should be xmm0 has the value 0x11111111111111112222222222222222.
Genoa added support for AVX512, which gives you a bunch more vector
registers, so now you can do:
movhps xmm28, [rel b]
However, I've found that non-deterministically, when using any register
above xmm15, previous (pipelined?) operations on other registers fail.
Here is an example:
section .data
data: dd 0x11111111, 0x22222222, 0x33333333, 0x44444444
zero: dd 0,0,0,0
section .text
vmovdqu xmm0, [rel data]
vmovlps xmm1, xmm0, [rel zero]
vmovhps xmm17, xmm0, [rel zero]
I think the expected result would be:
xmm0 = 0x44444444333333332222222211111111
xmm1 = 0x44444444333333330000000000000000
xmm17 = 0x00000000000000002222222211111111
However, on genoa we non-deterministically get xmm1=0.
I don't know the cause or where the bug is, any feedback welcome. I've
attached a testcase (I ported it to C from a raw fuzzer generated
testcase, hopefully it compiles consistently!).
I can reproduce it with pure intrinsics too (no asm), but the output is
not consistent across gcc versions. The attached version does use some
inline asm.
I think it should produce no output at all, but on Genoa it does sometimes
produce output for me.
Compile with:
$ gcc -mavx512vl -o movhps movhps.c
Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger taviso@sdf.org
_\_V _( ) _( ) @taviso
["movhps.c" (text/plain)]
#define _GNU_SOURCE
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#include <x86intrin.h>
#include <immintrin.h>
#include <sched.h>
#include <syscall.h>
#include <err.h>
#define __aligned __attribute__((aligned(32)))
#if !defined(__AVX512VL__)
# error You must compile this with -mavx512vl to get the needed intrinsics
#endif
static const uint64_t kData[] = { 0x4444444444444444, 0x4242424242424242 };
static const uint64_t kZero;
static void vmovhps_testcase()
{
uint64_t result[2] __aligned = {0};
register __m128i r0 asm("xmm0");
register __m128i r1 asm("xmm1");
register __m128i r17 asm("xmm17");
uint64_t count = 0;
_mm256_zeroall();
do {
count++;
// Trigger bug
asm volatile ("vmovdqu %1, %0" : "=v"(r0) : "m"(kData));
asm volatile ("vmovlps %2, %1, %0" : "=v"(r1) : "v"(r0), "m"(kZero));
asm volatile ("vmovhps %2, %1, %0" : "=v"(r17) : "v"(r0), "m"(kZero));
} while (!_mm_testz_si128(r1, r1));
_mm_storeu_si128((void *) result, r1);
fprintf(stderr, "After %llu: %016llx, %016llx\n", count, result[0], result[1]);
return;
}
int main(int argc, char **argv)
{
while (true) {
vmovhps_testcase();
}
return 0;
}
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic