'Integrated: 8297154: Improve safepoint cleanup logging'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-runtime-dev
Subject:    Integrated: 8297154: Improve safepoint cleanup logging
From:       Aleksey Shipilev <shade () openjdk ! org>
Date:       2022-11-23 16:25:27
Message-ID: sj5J9y5n_-e_woJgKinlkSLTsbP9noScavFrLpcVx3U=.84eae836-c4e4-423e-9111-cfd09d085038 () github ! com
[Download RAW message or body]

On Wed, 16 Nov 2022 18:52:50 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

> When backporting [JDK-8280784](https://bugs.openjdk.org/browse/JDK-8280784), I \
> realized that safepoint cleanup logging added by \
> [JDK-8219436](https://bugs.openjdk.org/browse/JDK-8219436) is misleading. \
> `-Xlog:safepoint` output includes "cleanup" time into "Reaching safepoint", which \
> users can arguably expect to be only the time-to-safepoint. With large reported \
> "Reaching safepoint" time, users would chase a safepoint polling / scheduling \
> problem, rather than suspect the safepoint cleanup costs. I was confused about this \
> myself, and only reading the safepoint timing code made the picture more clear. 
> It can be observed if you enable both `-Xlog:safepoint` and `-Xlog:safepoint+stats` \
> and run this on busy machine: 
> 
> $ build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:+SafepointALot \
> -XX:GuaranteedSafepointInterval=1000 [1.647s][info][safepoint,stats] VM Operation   \
> [ threads: total initial_running ][ time:       sync    cleanup       vmop      \
> total ] page_trap_count [1.647s][info][safepoint,stats] Cleanup                     \
> [             15               1 ][            26541     240544      30371     \
> 297456 ]               0 [1.647s][info][safepoint      ] Safepoint "Cleanup", Time \
> since last: 1499772987 ns, Reaching safepoint: 267085 ns, At safepoint: 30371 ns, \
> Total: 297456 ns 
> 
> Note how "sync" takes only 26541 ns, "cleanup" takes 240544 ns, yet "Reaching \
> safepoint" reports the sums of both. 
> To fix this, I propose we report "Cleanup" times separately. This changes the \
> logging statement. If we want to keep the logging statement intact, we can move \
> "cleanup" to "At safepoint" costs, which would be slightly less confusing. This is \
> what we do with `RuntimeService::record_safepoint_end` anyway: management layer \
> clearly separates "sync" and "safepoint" time, and "cleanup" more clearly belongs \
> to "safepoint" time here. 
> While we are at it, we can also track and report max cleanup time.
> 
> Example output after:
> 
> 
> [1.099s][info][safepoint,stats] VM Operation                 [ threads: total \
> initial_running ][ time:       sync    cleanup       vmop      total ] \
> page_trap_count [1.099s][info][safepoint,stats] Cleanup                      [      \
> 15               1 ][            77391     124662       5420     207473 ]           \
> 0 [1.099s][info][safepoint      ] Safepoint "Cleanup", Time since last: 1013526788 \
> ns, Reaching safepoint: 77391 ns, Cleanup: 124662 ns, At safepoint: 5420 ns, Total: \
> 207473 ns 
> 
> Additional testing:
> - [x] Eyeballing the safepoint logs
> - [x] Linux x86_64 fastdebug `tier1`, `tier2`

This pull request has now been integrated.

Changeset: 086763a6
Author:    Aleksey Shipilev <shade@openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/086763a629fe8b0592f5566c6e346716868d5e63
                
Stats:     11 lines in 2 files changed: 9 ins; 0 del; 2 mod

8297154: Improve safepoint cleanup logging

Reviewed-by: jsjolen, rehn

-------------

PR: https://git.openjdk.org/jdk/pull/11194


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic