[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512
From:       Kirill Yukhin <kirill.yukhin () gmail ! com>
Date:       2013-07-30 13:55:08
Message-ID: 20130730135507.GA668 () msticlxl57 ! ims ! intel ! com
[Download RAW message or body]

On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
> On 07/24/2013 05:23 AM, Richard Biener wrote:
> > "H.J. Lu" <hjl.tools@gmail.com> wrote:
> > 
> > > Hi,
> > > 
> > > Here is a patch to extend x86-64 psABI to support AVX-512:
> > 
> > Afaik avx 512 doubles the amount of xmm registers. Can we get them callee saved \
> > please?
> 
> Having them callee saved pre-supposes that one knows the width of the register.

Whole architecture of SSE/AVX is based on the fact of zerroing-upper.
For references - take a look at definition of VLMAX in Spec.
E.g. for AVX2 we had:
     vaddps %ymm1, %ymm2, %ymm3

Intuition says (at least to me) that after compilation it shouldn't have an idea of \
256-bit `upper' half. But with AVX-512 we have (again, see Spec, operation section of \
vaddps, VEX.256 encoded):  DEST[31:0] = SRC1[31:0] + SRC2[31:0]
    ...
    DEST[255:224] = SRC1[255:224] + SRC2[255:224].
    DEST[MAX_VL-1:256] = 0
So, legacy code *will* change upper 256-bit of vector register.

The roots can be found in GPR 64-bit insns. So, we have different behavior on 64-bit \
and 32-bit target for following sequence:  push %eax
    ;; play with eax
    pop %eax
on 64-bit machine upper 32-bits of %eax will be zeroed, and if we'll try to use old \
version of %rax - fail!

So, following such philosophy prohibits to make vector registers callee-safe.

BUT.

What if we make couple of new registers calle-safe in the sense of *scalar* type?
So, what we can do:
    1. make callee-safe only bits [0..XXX] of vector register.
    2. make call-clobbered bits of (XXX..VLMAX] in the same register.

XXX is number of bits to be callee-safe: 64, 80, 128 or even 512.

Advantage is that when we are doing FP scalar code, we don't bother about \
save/restore callee-safe part.  vaddss %xmm17, %xmm17, %xmm17
    call foo
    vaddss %xmm17, %xmm17, %xmm17

We don't care if `foo':
    - is legacy in AVX-512 sense – it just see no xmm17
    - in future ISA sense. If this code is 1024-bit wide reg and `foo' is AVX-512. It \
will save XXX bits, allowing us to continue scalar calculations without \
saving/restore

--
Thanks, K


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic