[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-hotspot-dev
Subject: Vectorized array mismatch updates
From: Paul Sandoz <paul.sandoz () oracle ! com>
Date: 2015-12-17 15:07:22
Message-ID: 247E5E23-8037-4B33-BE84-4BABAD1EAE7D () oracle ! com
[Download RAW message or body]
Hi,
The vectorized array mismatch implementation is now fully wired up to \
Arrays.equals/compare/mismatch in hs-comp and the intrinsic kicks in on x86 for C2.
There are a bunch of follow up tasks that need to be done (where appropriate i will \
log issues):
1) wiring up the vectorizedMismatch intrinsic stub in C1 on x86;
2) implementing the vectorizedMismatch intrinsic on other platforms, such Sparc and \
ARM (volunteers? the work is likely similar to that for compact string \
equality/comparison); and
3) from performance data cleaning up edge cases to reduce or ensure no regressions.
With regards to 3) i have uploaded a JMH benchmark project and raw results for:
- two x86 platforms supporting UseAVX=1 (AVX_1) and UseAVX=2 (AVX_2) respectively \
(thus AVX_1 and AVX_2 results are not directly comparable)
- C2 (-XX:-UseVectorizedMismatchIntrinsic as "Unsafe", and \
-XX:+UseVectorizedMismatchIntrinsic as "Vectorized")
- C1 (as "Unsafe", implicitly -XX:-UseVectorizedMismatchIntrinsic since there is no \
intrinsic yet for C1)
- comparing byte[] and long[]
- small (1..16) and large (2^2..12) array lengths where the content of two arrays are \
the same, or the last element differs (lastNEQ=false/true).
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/ \
<http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/>
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_1/ \
<http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_1/>
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_2/ \
<http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/perf/results/AVX_2/>
Observations so far:
(Note for byte[] the vectorizedMismatch does not kick in for an array length < 8).
- byte[], AVX_1, C2
- No regressions for small arrays, good improvements for large arrays
- For large arrays the Vectorized performance is marginally better than the Unsafe \
performance. I expect the gap to close once Roland's fix for JDK-8145322 is pushed \
(which creates more efficient address computation for unrolled Unsafe access loops)
- long[], AVX_1, C2
- For small arrays there are some regressions both for Vectorized and Unsafe
- For large arrays there are some regressions both for Vectorized and Unsafe.
For Unsafe this is due to JDK-8145322.
For Vectorized there is some variance that might be due to unlucky alignment of \
quadwords.
- Further investigation is required: e.g. have a threshold when vectorizedMismatch \
kicks in or we somehow disable Unsafe and/or Vectorized for UseAVX=1, if we can \
surface constants of vectorization/register widths etc. in a platform independent \
manner.
- byte[], AVX_2, C2
- For small arrays with Unsafe a small regression is observed at lengths of 11 and \
15 when the contents of the arrays are equal. This seems like a blip, but might be \
due to some odd code generation.
- For small arrays with Vectorized there is no regression.
- For large arrays performance is good, with Vectorized ~ 2x Unsafe once the length \
gets large enough (256/512 or larger) This translates into an ~10x improvement \
compared to an ordinary loop.
- long[], AVX_2, C2
- For small arrays there are some regressions, like for AVX_1
- For large arrays AVX_2 starts to show a 1.5x improvement.
Again some variance is observable, perhaps due to unlucky alignment.
- byte[]. AVX_1/2, C1
(Note only Unsafe results are available)
- For small arrays there are small regressions for < 8 probably due to the length \
check and branch to the ordinary loop. Not sure if there is much that can be done \
about.
- For large arrays the performance boost is good and can be much better if made \
intrinsic, e.g. ~5x to 8x
- long[]. AVX_1/2, C1
(Note only Unsafe results are available)
- For small and large arrays there are noticeable regressions. A C1 intrinsic \
should improve things.
Thanks,
Paul.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic