[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] MOV{H,L}PS instructions can fail on Genoa (Zen 4)
From:       Tavis Ormandy <taviso () gmail ! com>
Date:       2023-09-21 14:21:21
Message-ID: ZQxRYY0HLhGyn4jf () thinkstation ! cmpxchg8b ! net
[Download RAW message or body]

Hey, when fuzzing Genoa (AMD Zen 4) I noticed that sometimes the
MOV{H,L}PS instructions don't seem to work? I asked AMD if they consider
this a vulnerability, and they didn't.. so I'll just document it here
for reference...

Quick background, these instructions load two 32-bit packed singles from the
source operand into the low (movlps) or high (movhps) 64-bits of a vector
register.

Consider this minimal example:

section .data
    a: dq 0x1111111111111111
    b: dq 0x2222222222222222

section .text
    movhps  xmm0, [rel a]
    movlps  xmm0, [rel b]


The result should be xmm0 has the value 0x11111111111111112222222222222222.

Genoa added support for AVX512, which gives you a bunch more vector
registers, so now you can do:

    movhps  xmm28, [rel b]

However, I've found that non-deterministically, when using any register
above xmm15, previous (pipelined?) operations on other registers fail.

Here is an example:

section .data
    data: dd 0x11111111, 0x22222222, 0x33333333, 0x44444444
    zero: dd 0,0,0,0

section .text
    vmovdqu  xmm0, [rel data]
    vmovlps  xmm1, xmm0, [rel zero]
    vmovhps  xmm17, xmm0, [rel zero]

I think the expected result would be:

xmm0  = 0x44444444333333332222222211111111
xmm1  = 0x44444444333333330000000000000000
xmm17 = 0x00000000000000002222222211111111

However, on genoa we non-deterministically get xmm1=0.

I don't know the cause or where the bug is, any feedback welcome. I've
attached a testcase (I ported it to C from a raw fuzzer generated
testcase, hopefully it compiles consistently!).

I can reproduce it with pure intrinsics too (no asm), but the output is
not consistent across gcc versions. The attached version does use some
inline asm.

I think it should produce no output at all, but on Genoa it does sometimes
produce output for me.

Compile with:

$ gcc -mavx512vl -o movhps movhps.c

Tavis.

-- 
 _o)            $ lynx lock.cmpxchg8b.com
 /\\  _o)  _o)  $ finger taviso@sdf.org
_\_V _( ) _( )  @taviso

["movhps.c" (text/plain)]

#define _GNU_SOURCE
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#include <x86intrin.h>
#include <immintrin.h>
#include <sched.h>
#include <syscall.h>
#include <err.h>

#define __aligned __attribute__((aligned(32)))

#if !defined(__AVX512VL__)
# error You must compile this with -mavx512vl to get the needed intrinsics
#endif

static const uint64_t kData[] = { 0x4444444444444444, 0x4242424242424242 };
static const uint64_t kZero;

static void vmovhps_testcase()
{
    uint64_t result[2] __aligned = {0};
    register __m128i r0  asm("xmm0");
    register __m128i r1  asm("xmm1");
    register __m128i r17 asm("xmm17");
    uint64_t count = 0;

    _mm256_zeroall();

    do {
        count++;

        // Trigger bug
        asm volatile ("vmovdqu %1, %0"      : "=v"(r0)  : "m"(kData));
        asm volatile ("vmovlps %2, %1, %0"  : "=v"(r1)  : "v"(r0), "m"(kZero));
        asm volatile ("vmovhps %2, %1, %0"  : "=v"(r17) : "v"(r0), "m"(kZero));
    } while (!_mm_testz_si128(r1, r1));

    _mm_storeu_si128((void *) result, r1);

    fprintf(stderr, "After %llu: %016llx, %016llx\n", count, result[0], result[1]);
    return;
}

int main(int argc, char **argv)
{
    while (true) {
        vmovhps_testcase();
    }
    return 0;
}


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic