[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-dev
Subject:    Vectorized array mismatch updates
From:       Paul Sandoz <paul.sandoz () oracle ! com>
Date:       2015-12-17 15:07:22
Message-ID: 247E5E23-8037-4B33-BE84-4BABAD1EAE7D () oracle ! com
[Download RAW message or body]

Hi,

The vectorized array mismatch implementation is now fully wired up to \
Arrays.equals/compare/mismatch in hs-comp and the intrinsic kicks in on x86 for C2.

There are a bunch of follow up tasks that need to be done (where appropriate i will \
log issues):

1) wiring up the vectorizedMismatch intrinsic stub in C1 on x86;

2) implementing the vectorizedMismatch intrinsic on other platforms, such Sparc and \
ARM (volunteers? the work is likely similar to that for compact string \
equality/comparison); and

3) from performance data cleaning up edge cases to reduce or ensure no regressions.

With regards to 3) i have uploaded a JMH benchmark project and raw results for:

- two x86 platforms supporting UseAVX=1 (AVX_1) and UseAVX=2 (AVX_2) respectively \
(thus AVX_1 and AVX_2 results are not directly comparable)

- C2 (-XX:-UseVectorizedMismatchIntrinsic as "Unsafe", and \
-XX:+UseVectorizedMismatchIntrinsic as "Vectorized")

- C1 (as "Unsafe", implicitly -XX:-UseVectorizedMismatchIntrinsic since there is no \
intrinsic yet for C1)

- comparing byte[] and long[]

- small (1..16) and large (2^2..12) array lengths where the content of two arrays are \
the same, or the last element differs (lastNEQ=false/true).

  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/ \
<http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/>
  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_1/ \
<http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_1/>
  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_2/ \
<http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_2/>


Observations so far:
(Note for byte[] the vectorizedMismatch does not kick in for an array length < 8).

- byte[], AVX_1, C2
  - No regressions for small arrays, good improvements for large arrays
  - For large arrays the Vectorized performance is marginally better than the Unsafe \
performance.  I expect the gap to close once Roland's fix for JDK-8145322 is pushed \
(which creates more  efficient address computation for unrolled Unsafe access loops)

- long[], AVX_1, C2
  - For small arrays there are some regressions both for Vectorized and Unsafe
  - For large arrays there are some regressions both for Vectorized and Unsafe.
    For Unsafe this is due to JDK-8145322.
    For Vectorized there is some variance that might be due to unlucky alignment of \
                quadwords.
  - Further investigation is required: e.g. have a threshold when vectorizedMismatch \
kicks in  or we somehow disable Unsafe and/or Vectorized for UseAVX=1, if we can \
surface constants of  vectorization/register widths etc. in a platform independent \
manner.


- byte[], AVX_2, C2
  - For small arrays with Unsafe a small regression is observed at lengths of 11 and \
15 when the contents of the arrays are equal.  This seems like a blip, but might be \
                due to some odd code generation.
  - For small arrays with Vectorized there is no regression.
  - For large arrays performance is good, with Vectorized ~ 2x Unsafe once the length \
gets large enough (256/512 or larger)  This translates into an ~10x improvement \
compared to an ordinary loop.

- long[], AVX_2, C2
  - For small arrays there are some regressions, like for AVX_1
  - For large arrays AVX_2 starts to show a 1.5x improvement.
    Again some variance is observable, perhaps due to unlucky alignment.


- byte[]. AVX_1/2, C1
  (Note only Unsafe results are available)
  - For small arrays there are small regressions for < 8 probably due to the length \
check and branch to  the ordinary loop. Not sure if there is much that can be done \
                about.
  - For large arrays the performance boost is good and can be much better if made \
intrinsic, e.g. ~5x to 8x

- long[]. AVX_1/2, C1
  (Note only Unsafe results are available)
  - For small and large arrays there are noticeable regressions. A C1 intrinsic \
should improve things.


Thanks,
Paul.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic