'Re: RFR: 8302908: RISC-V: Support masked vector arithmetic instructions for Vector API [v12]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-compiler-dev
Subject:    Re: RFR: 8302908: RISC-V: Support masked vector arithmetic instructions for Vector API [v12]
From:       Dingli Zhang <dzhang () openjdk ! org>
Date:       2023-03-30 9:16:06
Message-ID: PDcK7TYEJQQQiyi5O-swRG4c3EffJNzUy2xjRt6Tqwo=.638fe4ed-9f89-48f5-892e-89fc08e51209 () github ! com
[Download RAW message or body]

> HI,
> 
> We have added support for vector add mask instructions, please take a look and have \
> some reviews. Thanks a lot! This patch will add support of vector add/sub/mul/div \
> mask version. It was implemented by referring to RVV v1.0 [1]. 
> ## Load/Store/Cmp Mask
> `VectorLoadMask, VectorMaskCmp, VectorStoreMask` will implement the mask datapath. \
> We can see where the data is passed in the compilation log with \
> `jdk/incubator/vector/Byte128VectorTests.java`： 
> 218     loadV V1, [R7]	# vector (rvv)
> 220     vloadmask V0, V1
> ...
> 23c     vmaskcmp_rvv_masked V0, V4, V5, V0, V1, #0
> 24c     vstoremask V1, V0
> 258     storeV [R7], V1	# vector (rvv)
> 
> 
> The corresponding generated jit assembly：
> 
> # loadV
> 0x000000400c8ef958:   vsetvli t0,zero,e8,m1,tu,mu
> 0x000000400c8ef95c:   vle8.v  v1,(t2)
> 
> # vloadmask
> 0x000000400c8ef960:   vsetvli t0,zero,e8,m1,tu,
> 0x000000400c8ef964:   vmsne.vx    v0,v1,zero
> 
> # vmaskcmp_rvv_masked
> 0x000000400c8ef97c:   vsetvli   t0,zero,e8,m1,tu,mu
> 0x000000400c8ef980:   vmclr.m   v1
> 0x000000400c8ef984:   vmseq.vv  v1,v4,v5,v0.t
> 0x000000400c8ef988:   vmv1r.v   v0,v1
> 
> # vstoremask
> 0x000000400c8ef98c:   vsetvli t0,zero,e8,m1,tu,mu
> 0x000000400c8ef990:   vmv.v.x v1,zero
> 0x000000400c8ef994:   vmerge.vim  v1,v1,1,v0
> 
> 
> ## Masked vector arithmetic instructions (e.g. vadd)
> AddMaskTestMerge case:
> 
> import jdk.incubator.vector.IntVector;
> import jdk.incubator.vector.VectorMask;
> import jdk.incubator.vector.VectorOperators;
> import jdk.incubator.vector.VectorSpecies;
> 
> public class AddMaskTestMerge {
> 
> static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_128;
> static final int SIZE = 1024;
> static int[] a = new int[SIZE];
> static int[] b = new int[SIZE];
> static int[] r = new int[SIZE];
> static boolean[] c = new boolean[]{true,false,true,false,true,false,true,false};
> static {
> for (int i = 0; i < SIZE; i++) {
> a[i] = i;
> b[i] = i;
> }
> }
> 
> static void workload(int idx) {
> VectorMask<Integer> vmask = VectorMask.fromArray(SPECIES, c, 0);
> IntVector av = IntVector.fromArray(SPECIES, a, idx);
> IntVector bv = IntVector.fromArray(SPECIES, b, idx);
> av.lanewise(VectorOperators.ADD, bv, vmask).intoArray(r, idx);
> }
> 
> public static void main(String[] args) {
> for (int i = 0; i < 30_0000; i++) {
> for (int j = 0; j < SIZE; j += SPECIES.length()) {
> workload(j);
> }
> }
> }
> }
> 
> 
> This test case is reduced from existing jtreg vector tests \
> Int128VectorTests.java[2]. This test case corresponds to the add instruction of the \
> vector mask version and other instructions are similar. 
> Before this patch, the compilation log will not print RVV-related instructions. Now \
> the compilation log is as follows: 
> 
> 0ae     B10: #	out( B25 B11 ) <- in( B9 )  Freq: 0.999991
> 0ae     loadV V1, [R31]	# vector (rvv)
> 0b6     vloadmask V0, V2
> 0be     vadd.vv V3, V1, V0	#@vaddI_masked
> 0c6     lwu  R28, [R7, #124]	# loadN, compressed ptr, #@loadN ! Field: \
> AddMaskTestMerge.r 0ca     decode_heap_oop  R28, R28	#@decodeHeapOop
> 0cc     lwu  R7, [R28, #12]	# range, #@loadRange
> 0d0     NullCheck R28
> 
> 
> And the jit code is as follows:
> 
> 
> 0x000000400c823cee:   vsetvli t0,zero,e32,m1,tu,mu
> 0x000000400c823cf2:   vle32.v v1,(t6)                     ;*invokestatic store \
> {reexecute=0 rethrow=0 return_oop=0} ; - \
> jdk.incubator.vector.IntVector::intoArray@43 (line 3228) ; - \
>                 AddMaskTestMerge::workload@46 (line 25)
> 0x000000400c823cf6:   vsetvli t0,zero,e8,m1,tu,mu
> 0x000000400c823cfa:   vmsne.vx        v0,v2,zero          ;*invokestatic load \
> {reexecute=0 rethrow=0 return_oop=0} ; - \
> jdk.incubator.vector.VectorMask::fromArray@47 (line 208) ; - \
>                 AddMaskTestMerge::workload@7 (line 22)
> 0x000000400c823cfe:   vsetvli t0,zero,e32,m1,tu,mu
> 0x000000400c823d02:   vadd.vv v3,v3,v1,v0.t               ;*invokestatic binaryOp \
> {reexecute=0 rethrow=0 return_oop=0} ; - \
> jdk.incubator.vector.IntVector::lanewiseTemplate@192 (line 834) ; - \
> jdk.incubator.vector.Int128Vector::lanewise@9 (line 291) ; - \
> jdk.incubator.vector.Int128Vector::lanewise@4 (line 41) ; - \
> AddMaskTestMerge::workload@39 (line 25) 
> 
> ## Mask register allocation & mask bit opreation
> Since v0 is to be used as a mask register in spec[1], sometimes we need two vmask \
> to do the vector mask logical ops like `AndVMask, OrVMask, XorVMask`. And if only \
> v0 and v31 mask registers are defined, the corresponding c2 nodes will not be \
> generated correctly because of the register pressure[3]. When we use only v0 and \
> v31 as mask registers, jtreg testing of Byte128VectorTests.java[4] with \
> `-XX:+PrintAssembly` and `-XX:LogFile` will not emit the expected rvv mask \
> instruction. As opposed to this, the following compilation failure log[3] is \
> generated: 
> <intrinsic id='_VectorBinaryOp' nodes='20'/>
> <method_not_compilable_at_tier level='4'/>
> <failure reason='failed spill-split-recycle sanity check' phase='compile'/>
> <failure reason='failed spill-split-recycle sanity check'/>
> <task_done success='0' nmsize='0' count='22784' stamp='16.146'/>
> 
> 
> So define v30 and v31 as mask register too and `AndVMask` will emit the C2 JIT code \
> like: 
> vloadmask V0, V1
> vloadmask V30, V2
> vmask_and V0, V30, V0
> 
> We also modified the implementation of `spill_copy_vector_stack_to_stack ` so that \
> it no longer occupies the v0 register. In addition to that, we change some node \
> like `vasr/vlsl/vlsr/vstring_x/varray_x/vclearArray_x`, which use v0 internally, to \
> make C2 to sense that they used v0. 
> By the way, the current implementation of `VectorMaskCast` is for the case of equal \
> width of the parameter data, other cases depend on the subsequent cast node. 
> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc
> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int128VectorTests.java
>  [3] https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/src/hotspot/share/opto/chaitin.cpp#L526
>  [4] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Byte128VectorTests.java
>  
> ### Testing:
> 
> qemu with UseRVV:
> - [x] Tier1 tests (release)
> - [x] Tier2 tests (release)
> - [ ] Tier3 tests (release)
> - [x] test/jdk/jdk/incubator/vector (release/fastdebug)

Dingli Zhang has updated the pull request incrementally with one additional commit \
since the last revision:

  Fix comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/12682/files
  - new: https://git.openjdk.org/jdk/pull/12682/files/c66fefec..8083ede3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12682&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12682&range=10-11

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/12682.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/12682/head:pull/12682

PR: https://git.openjdk.org/jdk/pull/12682


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic