'Re: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v6]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-dev
Subject:    Re: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v6]
From:       Claes Redestad <redestad () openjdk ! java ! net>
Date:       2022-02-23 14:19:20
Message-ID: GGlhZ0PA4ylRTq3mOm5QaAKjc38LKaVNfRZd-m9LqwY=.c36c7db0-c00c-4e1b-bf6d-8f48edd8b6b5 () github ! com
[Download RAW message or body]

> I'm requesting comments and, hopefully, some help with this patch to replace \
> `StringCoding.hasNegatives` with `countPositives`. The new method does a very \
> similar pass, but alters the intrinsic to return the number of leading bytes in the \
> `byte[]` range which only has positive bytes. This allows for dealing much more \
> efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost \
> on ASCII-only or latin1/UTF16-mostly input. 
> Microbenchmark results: \
> https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904
>  
> - Only implemented on x86 for now, but I want to verify that implementations of \
> `countPositives` can be implemented with similar efficiency on all platforms that \
> today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. \
> This pretty much means holding up this until it's implemented on all platforms, \
> which can either contributed to this PR or as dependent follow-ups. 
> - An alternative to holding up until all platforms are on board is to allow the \
> implementation of `StringCoding.hasNegatives` and `countPositives` to be \
> implemented so that the non-intrinsified method calls into the intrinsified. This \
> requires structuring the implementations differently based on which intrinsic - if \
> any - is actually implemented. One way to do this could be to mimic how `java.nio` \
> handles unaligned accesses and expose which intrinsic is available via `Unsafe` \
> into a `static final` field. 
> - There are a few minor regressions (~5%) in the x86 implementation on \
> `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for \
> example `encode-/decodeShortMixed` even see a minor improvement, which makes me \
> consider those corner case regressions with little real world implications (if you \
> have latin1 Strings, you're likely to also have ASCII-only strings in your mix).

Claes Redestad has updated the pull request with a new target base due to a merge or \
a rebase. The pull request now contains 29 commits:

 - Resolve merge conflict
 - Fix TestCountPositives to correctly allow 0 return when expected != len (for now)
 - aarch64: fix issue with short inputs divisible by wordSize
 - Switch aarch64 intrinsic to a variant of countPositives returning len or zero as a \
                first step.
 - Revert micro changes, split out to #7516
 - Merge branch 'master' of https://github.com/cl4es/jdk into count_positives
 - Merge branch 'master' into count_positives
 - Restore partial vector checks in AVX2 and SSE intrinsic variants
 - Let countPositives use hasNegatives to allow ports not implementing the \
                countPositives intrinsic to stay neutral
 - Simplify changes to encodeUTF8
 - ... and 19 more: https://git.openjdk.java.net/jdk/compare/5035bf5e...685795ce

-------------

Changes: https://git.openjdk.java.net/jdk/pull/7231/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=05
  Stats: 532 lines in 29 files changed: 308 ins; 53 del; 171 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7231.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231

PR: https://git.openjdk.java.net/jdk/pull/7231


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic