'Re: String.indexOf(single-char-String)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-core-libs-dev
Subject:    Re: String.indexOf(single-char-String)
From:       Michael Bien <mbien42 () gmail ! com>
Date:       2021-11-27 17:53:57
Message-ID: 8ac3fcc2-8bc2-fb84-0dbb-f85c69a0e214 () gmail ! com
[Download RAW message or body]

StringIndexOfChar showed that the assumption that the char variant 
should be at least as fast as the String variant doesn't always apply. 
The implementation for indexOf(char) seems to have a different code 
emission heuristics than the String variant.
https://github.com/openjdk/jdk/pull/6509#issuecomment-980681069

the following benchmark reproduces the anomaly:

package dev.mbien.jmh;

import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.results.format.ResultFormatType;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class SSECharAnomalyJMH {

//Benchmark                      (length)  Mode  Cnt   Score Error  Units
//CharVsStringJMH.indexOfChar           0  avgt    5   4.895 ą 0.063  ns/op
//CharVsStringJMH.indexOfString         0  avgt    5   6.131 ą 0.115  ns/op

//CharVsStringJMH.indexOfChar           1  avgt    5  10.071 ą 0.121  
ns/op //xxx
//CharVsStringJMH.indexOfString         1  avgt    5   6.322 ą 0.099  
ns/op //xxx

//CharVsStringJMH.indexOfChar           2  avgt    5   4.577 ą 0.009  ns/op
//CharVsStringJMH.indexOfString         2  avgt    5   5.917 ą 0.019  ns/op

//CharVsStringJMH.indexOfChar           3  avgt    5   7.093 ą 0.040  ns/op
//CharVsStringJMH.indexOfString         3  avgt    5  11.533 ą 0.100  ns/op

// -XX:-UseSSE42Intrinsics
//Benchmark                      (length)  Mode  Cnt   Score Error  Units
//CharVsStringJMH.indexOfChar           0  avgt    5   5.063 ą 0.023  ns/op
//CharVsStringJMH.indexOfString         0  avgt    5   6.605 ą 0.288  ns/op

//CharVsStringJMH.indexOfChar           1  avgt    5   8.768 ą 0.082  
ns/op //compare with SSE version
//CharVsStringJMH.indexOfString         1  avgt    5  10.640 ą 0.133  ns/op

//CharVsStringJMH.indexOfChar           2  avgt    5   9.371 ą 0.021  ns/op
//CharVsStringJMH.indexOfString         2  avgt    5  12.427 ą 0.087  ns/op

//CharVsStringJMH.indexOfChar           3  avgt    5  16.435 ą 0.235  ns/op
//CharVsStringJMH.indexOfString         3  avgt    5  17.566 ą 0.143  ns/op


     @Param({"0", "1", "2", "3"})
     private int length = 0;

     private final String[] str = {
         "abcd",
         "abBaegyswratgd",                                       // 
length < 16
         "abBaegyswratgfed",                                     // 
length = 16
         "ABCDEFGHIJKLMNOPQRSTUVWXYZabcefghijklmnopqrstuvwxyzd", // ~40
     };


     @Benchmark
     public void indexOfChar(Blackhole bh) throws InterruptedException {
         bh.consume(str[length].indexOf('d') != -1);
     }

     @Benchmark
     public void indexOfString(Blackhole bh) throws InterruptedException {
         bh.consume(str[length].indexOf("d") != -1);
     }


     public static void main(String[] args) throws RunnerException {

         Options opt = new OptionsBuilder()
                 .include(SSECharAnomalyJMH.class.getSimpleName())
                 .resultFormat(ResultFormatType.JSON)
.result("results/"+SSECharAnomalyJMH.class.getSimpleName()+".json")
                 .build();

         new Runner(opt).run();

         opt = new OptionsBuilder()
                 .include(SSECharAnomalyJMH.class.getSimpleName())
                 .resultFormat(ResultFormatType.JSON)
                 .jvmArgsAppend("-XX:-UseSSE42Intrinsics")
//                .jvmArgsAppend("-XX:UseAVX=0")
.result("results/"+SSECharAnomalyJMH.class.getSimpleName()+"2.json")
                 .build();

         new Runner(opt).run();
     }

}



On 26.11.21 14:42, Michael Bien wrote:
> added benchmark results for OpenJDK's StringIndexOf benchmark:
> https://github.com/openjdk/jdk/pull/6509#issuecomment-979985594
>
> -michael
>
>
> On 25.11.21 15:05, Michael Bien wrote:
>> Hello again,
>>
>> I was trying to run JDK's benchmarks over night (second attempt 
>> actually) but had some difficulties to get stable results.
>>
>> This makes it difficult to compare the modified version with a 
>> reference. I am not sure what the cause is, I have heard some intel 
>> CPUs can't run avx instructions for a long time without changing 
>> clock - maybe i am hitting this issue?
>> Its not the temperature and i turned boost and HT off + it runs in 
>> headless mode. One run already takes almost 5h and I have to run it 
>> twice - so i can't increase the iterations even more.
>>
>>
>> for example:
>>
>> # Benchmark: 
>> org.openjdk.bench.java.lang.StringIndexOfChar.utf16_mixed_String
>> # Parameters: (loops = 100000, pathCnt = 1000, rngSeed = 1999)
>>
>> # Run progress: 95.35% complete, ETA 00:13:20
>> # Fork: 1 of 1
>> # Warmup Iteration   1: 18592.094 ns/op    <- second fastest run?
>> # Warmup Iteration   2: 20519.413 ns/op
>> # Warmup Iteration   3: 19768.099 ns/op
>> # Warmup Iteration   4: 23093.410 ns/op
>> # Warmup Iteration   5: 29112.909 ns/op
>> # Warmup Iteration   6: 18962.671 ns/op
>> # Warmup Iteration   7: 16721.933 ns/op    <- fastest run?
>> # Warmup Iteration   8: 20267.809 ns/op
>> # Warmup Iteration   9: 23934.031 ns/op
>> # Warmup Iteration  10: 22474.836 ns/op
>> # Warmup Iteration  11: 19583.471 ns/op
>> # Warmup Iteration  12: 19595.319 ns/op
>> # Warmup Iteration  13: 24865.299 ns/op
>> # Warmup Iteration  14: 19581.014 ns/op
>> # Warmup Iteration  15: 19566.849 ns/op
>> # Warmup Iteration  16: 19576.219 ns/op
>> # Warmup Iteration  17: 19574.475 ns/op
>> # Warmup Iteration  18: 19565.854 ns/op
>> # Warmup Iteration  19: 26594.867 ns/op
>> # Warmup Iteration  20: 26532.977 ns/op
>> Iteration   1: 25484.070 ns/op
>> Iteration   2: 19594.206 ns/op
>> Iteration   3: 30327.037 ns/op
>> Iteration   4: 31029.242 ns/op  <- xxx
>> Iteration   5: 19560.472 ns/op
>> Iteration   6: 19611.728 ns/op
>> Iteration   7: 23214.511 ns/op
>> Iteration   8: 28455.757 ns/op
>> Iteration   9: 19787.638 ns/op
>> Iteration  10: 23737.501 ns/op
>> Iteration  11: 25947.249 ns/op
>> Iteration  12: 19768.214 ns/op
>> Iteration  13: 25789.970 ns/op
>> Iteration  14: 20558.622 ns/op
>> Iteration  15: 19611.317 ns/op
>> Iteration  16: 27761.431 ns/op
>> Iteration  17: 19749.799 ns/op
>> Iteration  18: 20862.478 ns/op
>> Iteration  19: 19581.498 ns/op
>> Iteration  20: 28094.839 ns/op
>>
>>
>> latin1_Short_String, latin1_Short_char, latin1_mixed_String, 
>> latin1_mixed_char, utf16_mixed_String and utf16_mixed_char have all 
>> large error bars (all in StringIndexOfChar).
>>
>>
>> best regards,
>>
>> michael
>>
>>
>>
>> On 23.11.21 17:06, Michael Bien wrote:
>>> On 23.11.21 15:57, Roger Riggs wrote:
>>>> Hi Michael,
>>>>
>>>> As you might expect performance of strings is very sensitive and 
>>>> has been tuned extensively over the years many times.
>>>>
>>>> Though this improves the performance for 1 character strings. It 
>>>> will have an impact on *every other* length of string.
>>>> You'll need to show that it does not impact performance of longer 
>>>> strings.
>>>
>>> yes of course. The if (str.length == 1) branch should be dead code 
>>> and eliminated by the JVM for all String constants with non-one 
>>> lengths.
>>>
>>> Looking through the benchmarks in micro/*/java/lang/String*, all 
>>> seem to be using constants as parameter for indexOf(). To try to 
>>> measure the impact of the if branch i would have to write a 
>>> benchmark with a parameter which changes every iteration, right? 
>>> Otherwise the branch will be optimized away by the JIT.
>>>
>>>>
>>>> It may be worth looking further at other ways to achieve the result.
>>>
>>> agreed, I tried the most obvious approach first, but there is a 
>>> chance that the fast path can be put into the intrinsified 
>>> StringLatin1/StringUTF16 code instead.
>>>
>>> -michael
>>>
>>>
>>>>
>>>> Regards, Roger
>>>>
>>>>
>>>> On 11/22/21 3:52 PM, Michael Bien wrote:
>>>>> Hello,
>>>>>
>>>>> I kept forgetting which variants of the String methods perform 
>>>>> better with single-char-Strings and which with char (IDEs had the 
>>>>> tendency to suggest the wrong variant since it changed between JDK 
>>>>> releases). So i wrote JMH benchmarks and noticed that the last 
>>>>> method with a performance difference seems to be String.indexOf() 
>>>>> - all other variants performed equally (unless I overlooked some).
>>>>>
>>>>> this might be fairly easy to fix:
>>>>>
>>>>> https://github.com/openjdk/jdk/pull/6509
>>>>>
>>>>> (side effect: contains("c") is also faster)
>>>>>
>>>>> I haven't looked into the intrinsified code of StringLatin1 and 
>>>>> StringUTF16 to check if it could be fixed there (mostly because i 
>>>>> actually don't know how the JVM assembles those intrinsics). It 
>>>>> might be possible to improve this for short Strings in general, 
>>>>> not just for chars, dependent on why the intrinsified version is 
>>>>> actually slower for single-char-Strings. I opted for the trivial 
>>>>> fix in java code.
>>>>>
>>>>> best regards,
>>>>>
>>>>> michael
>>>>>
>>>>
>>>
>>
>

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic