[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    RFR: JDK-8306441: Segmented heap dump
From:       Yi Yang <yyang () openjdk ! org>
Date:       2023-04-26 10:03:02
Message-ID: 8YqPPHSW4K1s0t317Kp6UqvoGuv5v9oCbjtQ9FX8p2o=.0f6c687b-d031-401d-901d-1ec532715cdc () github ! com
[Download RAW message or body]

Hi, heap dump brings about pauses for application's execution(STW), this is a \
well-known pain. JDK-8252842 have added parallel support to heapdump in an attempt to \
alleviate this issue. However, all concurrent threads competitively write heap data \
to the same file, and more memory is required to maintain the concurrent buffer \
queue. In experiments, we did not feel a significant performance improvement from \
that.

The minor-pause solution, which is presented in this PR, is a two-stage segmented \
heap dump:

1. Stage One(STW): Concurrent threads directly write data to multiple heap files.
2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump file.

Now concurrent worker threads are not required to maintain a buffer queue, which \
would result in more memory overhead, nor do they need to compete for locks. It \
significantly reduces 73~80% application pause time. 

> memory | numOfThread | STW         | Total      |
> --- | --------- | -------------- | ------------ |
> 8g | 1 thread | 15.612 secs | 15.612 secs |
> 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> 16g | 1 thread | 26.278 secs | 26.278 secs |
> 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> 32g | 1 thread | 48.149 secs | 48.149 secs |
> 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> 64g | 1 thread |  100.583 secs | 100.583 secs |
> 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> 128g | 1 thread | 233.843 secs | 233.843 secs |
> 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> 128g | 96 thread | 67.6815929 secs | 336.345 secs |

> **Total** means the total heap dump including both two phases
> **STW** means the first phase only.
> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, **Total** = \
> **STW**

![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)


In actual testing, two-stage solution can lead to an increase in the overall time for \
heapdump(See table above). However, considering the reduction of STW time, I think it \
is an acceptable trade-off. Furthermore, there is still room for optimization in the \
second merge stage(e.g. sendfile/splice/copy_file_range instead of read+write \
combination). Since number of parallel dump thread has a considerable impact on total \
dump time, I added a parameter that allows users to specify the number of parallel \
dump thread they wish to run.

##### Open discussion

- Pauseless heap dump solution?
An alternative pauseless solution is to fork a child process, set the parent process \
heap to read-only, and dump the heap in child process. Once writing happens in parent \
process, child process observes them by userfaultfd and corresponding pages are \
prioritized for dumping. I'm also looking forward to hearing comments and discussions \
about this solution.

- Client parser support for segmented heap dump
This patch provides a possibility that whether heap dump needs to be complete or not, \
can the VM directly generate segmented heapdump, and let the client parser complete \
the merge process? Looking forward to hearing comments from the Eclipse MAT community

-------------

Commit messages:
 - JDK-8306441: Segmented heap dump

Changes: https://git.openjdk.org/jdk/pull/13667/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8306441
  Stats: 2838 lines in 11 files changed: 1006 ins; 1770 del; 62 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic