[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR: JDK-8306441: Two phase segmented heap dump [v20]
From:       Kevin Walls <kevinw () openjdk ! org>
Date:       2023-07-31 22:03:57
Message-ID: gN6x_8wbMOuPetY6JBIiYRRO-Zkgczy7q3LMkWxRq1I=.1dfdf96d-c348-48d4-8082-d5b3de489b51 () github ! com
[Download RAW message or body]

On Wed, 19 Jul 2023 12:42:27 GMT, Yi Yang <yyang@openjdk.org> wrote:

> > ### Motivation and proposal
> > Hi, heap dump brings about pauses for application's execution(STW), this is a \
> > well-known pain. JDK-8252842 have added parallel support to heapdump in an \
> > attempt to alleviate this issue. However, all concurrent threads competitively \
> > write heap data to the same file, and more memory is required to maintain the \
> > concurrent buffer queue. In experiments, we did not feel a significant \
> > performance improvement from that. 
> > The minor-pause solution, which is presented in this PR, is a two-phase segmented \
> > heap dump: 
> > - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
> > - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump file. \
> > This process can happen outside safepoint. 
> > Now concurrent worker threads are not required to maintain a buffer queue, which \
> > would result in more memory overhead, nor do they need to compete for locks. The \
> > changes in the overall design are as follows: 
> > ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
> >  <p align="center">Fig1. Before</p>
> > 
> > ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
> >  <p align="center">Fig2. After this patch</p>
> > 
> > ### Performance evaluation
> > > memory | numOfThread | CompressionMode | STW | Total |
> > > -------| ----------- | --------------- | --- | ---- |
> > > 8g | 1 T | N | 15.612 | 15.612 |
> > > 8g | 32 T | N | 2.561725 | 14.498 |
> > > 8g | 32 T | C1 | 2.3084878 | 14.198 |
> > > 8g | 32 T | C2 | 10.9355128 | 21.882 |
> > > 8g | 96 T | N | 2.6790452 | 14.012 |
> > > 8g | 96 T | C1 | 2.3044796 | 3.589 |
> > > 8g | 96 T | C2 | 9.7585151 | 20.219 |
> > > 16g | 1 T | N | 26.278 | 26.278 |
> > > 16g | 32 T | N | 5.231374 | 26.417 |
> > > 16g | 32 T | C1 | 5.6946983 | 6.538 |
> > > 16g | 32 T | C2 | 21.8211105 | 41.133 |
> > > 16g | 96 T | N | 6.2445556 | 27.141 |
> > > 16g | 96 T | C1 | 4.6007096 | 6.259 |
> > > 16g | 96 T | C2 | 19.2965783 | 39.007 |
> > > 32g | 1 T | N | 48.149 | 48.149 |
> > > 32g | 32 T | N | 10.7734677 | 61.643 |
> > > 32g | 32 T | C1 | 10.1642097 | 10.903 |
> > > 32g | 32 T | C2 | 43.8407607 | 88.152 |
> > > 32g | 96 T | N | 13.1522042 | 61.432 |
> > > 32g | 96 T | C1 | 9.0954641 | 9.885 |
> > > 32g | 96 T | C2 | 38.9900931 | 80.574 |
> > > 64g | 1 T | N | 100.583 | 100.583 |
> > > 64g | 32 T | N | 20.9233744 | 134.701 |
> > > 64g | 32 T | C1 | 18.5023784 | 19.358 |
> > > 64g | 32 T | C2 | 86.4748377 | 172.707 |
> > > 64g | 96 T | N | 26.7374116 | 126.08 |
> > > 64g | ...
> 
> Yi Yang has updated the pull request incrementally with one additional commit since \
> the last revision: 
> test failure on Windows

Hi, a few more points today, but I'm getting there. 8-)

1
The previous parallel dump change:
8252842: Extend jmap to support parallel heap dump
..the notes suggest it was going to have a CSR for the new option, but was then \
decided not to add the option to specify the number of threads. That CSR was \
https://bugs.openjdk.org/browse/JDK-8260424 which was withdrawn, but looks like we \
should do one for this change. If we had a reasonable expectation of what the best \
number to use was, we could do the same thing here.  If we might get that wrong, we \
will need the user to be able to specify. Actually are we changing the default \
behaviour: from "parallel if you can", to "serial unless you specify".


2
Running the changes, I got a test failure on macos debug (x64 and aarch64):

test/hotspot/jtreg/serviceability/HeapDump/IntegrityHeapDumpTest.java

On different machines, one x64 and one aarch64, got the same kind of error:
----------stderr:(6/609)----------
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file \
/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/.java_pid36008: target process 36008 \
doesn't respond within 10500ms or HotSpot VM not loaded  at \
jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:95)  at \
jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
  at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
	at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
	
..so the test fails with: java.lang.RuntimeException: Expected to get exit value of \
[0], exit value is: [1]

Other tests e.g. test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpTest.java run OK.

The failing test IntegrityHeapDumpTest.java extends LingeredApp.
test/hotspot/jtreg/serviceability/HeapDump/DuplicateArrayClassesTest.java does this, \
and runs and passes.

I don't yet understand the failure.  Could things have gone wrong in macosx debug \
builds in the attach listener.. I didn't get a stack of the LingeredApp under test, \
but will try again.


3
Docs
https://openjdk.org/guide/#release-notes
This enhancement should probably have a release note, and we could briefly explain \
what parallel means.   We probably need to still recommend it is not used on a \
production server unless you have tested the impact and are happy it does not disturb \
the application. Maybe we can offer a recommended usage, a guide number of threads as \
a starting point. We also need to explain when this can or cannot be used (SerialGC, \
any other restrictions? It's not a problem that SerialGC does not use this, just that \
we should make it clear).   I can help with the wording if needed, if you'd like to \
start something (this might make it into other docs also.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1659249670


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic