[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    RFR: 8305209: JDWP exit error AGENT_ERROR_INVALID_THREAD(203): missing entry in running thread table
From:       Chris Plummer <cjplummer () openjdk ! org>
Date:       2023-03-30 16:11:15
Message-ID: K5KaFRN0Nf6PFjXEdFQ_RsJJhjcxYszplEIC-oem2UM=.b62f6c54-6fc3-4ff7-a5d8-4f4fa9417edc () github ! com
[Download RAW message or body]

The real purpose of this PR is to add virtual thread support to \
ThreadMemoryLeakTest.java, but this exposed bugs in both the debug agent and in \
TestScaffold, so those are being fixed also (and the debug agent bug is the CR being \
used).

The debug agent bug is due to a race condition during VM exit. The VM is in the \
process of shutting down. The debug agent has already disabled JVMTI callbacks and \
has sent the VMDeathEvent. At this point in time there are also threads exiting that \
the debug agent knows about, but it will not get a ThreadEndEvent for because of the \
callbacks being disabled. Thus these threads remain in the debug agent's list of \
known threads, even though they have exited. The debuggee receives the VMDeathEvent \
and does a VM.resume(). During the debug agent's handing of the VM.Resume command, it \
iterates over all known threads and needs to map each to its ThreadNode so it can be \
resumed, and this mapping requires accessing the JVMTI TLS for the thread. The \
problem is some of the threads may have exited already, and therefore no longer have \
TLS. This results in the assert in the debug agent. This debug agent issue was \
already addressed for platform threads, but not for virtual threads, which is why we \
s  tarted seeing this issue when this test was modified. The fix is to just replicate \
what is done for platform threads for virtual threads also.

The TestScaffold bug is that if the debuggee crashes/asserts, this is likely to go \
unnoticed, especially if it happens during VM exit (and the test essentially has \
already completed). Because of this TestScaffold bug, the debug agent bug above did \
not result in a test failure. After fixing TestScaffold to check the exitCode of the \
debuggee process, the test started to appropriately fail until the debug agent was \
fixed.

One other thing to point out is the OOME issue I started getting frequently when \
testing with virtual threads. Since virtual threads are created at a much higher rate \
than platform threads, their creation started to overwhelm the debugger (actually the \
JDI implementation). There is already a mechanism in place to do a VM.HoldEvents if \
JDI has queue up 10,000 events. The problem is that events are coming in so fast that \
even after doing the VM.HoldEvents, the number of queued events continues to go up \
for a while, and sometimes reaches 30,000 or more. This raises the peak memory usage \
of the test quite a bit. Since the test purposely uses a small heap so a memory leak \
is quickly and reliably detected, the large queue often results in an OOME. Because \
of this I make virtual threads sleep for 100ms instead of 50ms to slow down their \
creation, and this resolved the issue. 

I tested by running all of test/jdk/com/sun/jdi 25 times on each platform with and \
without virtual thread testing enabled.

-------------

Commit messages:
 - Support virtual thread testing.
 - Fix issues with missing virtual thread during VM shutdown

Changes: https://git.openjdk.org/jdk/pull/13246/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13246&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8305209
  Stats: 40 lines in 3 files changed: 35 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13246.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13246/head:pull/13246

PR: https://git.openjdk.org/jdk/pull/13246


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic