[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR: JDK-8218701: jdb tests failed with AGENT_ERROR_INVALID_THREAD
From:       Chris Plummer <chris.plummer () oracle ! com>
Date:       2019-05-25 6:24:32
Message-ID: 19ca0e95-55b0-9e96-d295-268c4c0c648b () oracle ! com
[Download RAW message or body]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Hi Gary,<br>
      <br>
      I'm curious as to what event is triggering this. When the JVMTI
      event comes in, you eventually end up in this code in
      enqueueCommand():<br>
      <br>
             if (wait) {<br>
                     debugMonitorEnter(commandCompleteLock);<br>
                     while (!command-&gt;done) {<br>
                             log_debugee_location("enqueueCommand(): HelperCommand
      wait", NULL, NULL, 0);<br>
                             debugMonitorWait(commandCompleteLock);<br>
                     }<br>
                     freeCommand(command);<br>
                     debugMonitorExit(commandCompleteLock);<br>
             }<br>
      <br>
      "wait" is normally true for things like breakpoint and singlestep
      events that will suspend the thread that the event came in on
      (SUSPEND_ALL or SUSPEND_THREAD). So this will cause the thread
      that the JVMTI event was delivered on to block, preventing it from
      exiting. If the policy is SUSPEND_NONE, then it will not block and
      could exit.<br>
      <br>
      In another thread you have commandloop() which dequeues the
      command, "handles" it (more on this below), and then if the
      command has the SUSPEND_ALL policy, will block until the debugger
      resumes all threads. Before doing this it will have already
      suspended threads according to the policy.<br>
      <br>
      The code you are executing in when the problem happens is in the
      command handling part.<br>
      <br>
         commandLoop() -&gt;<br>
             handleCommand() -&gt;<br>
                 handleReportEventCompositeCommand() -&gt;<br>
                     suspendWithInvokeEnabled()<br>
      <br>
      static void<br>
      suspendWithInvokeEnabled(jbyte policy, jthread thread)<br>
      {<br>
             invoker_enableInvokeRequests(thread);<br>
      <br>
             if (policy == JDWP_SUSPEND_POLICY(ALL)) {<br>
                     (void)threadControl_suspendAll();<br>
             } else {<br>
                     (void)threadControl_suspendThread(thread, JNI_FALSE);<br>
             }<br>
      }<br>
      <br>
      So one question is what is the suspend policy when this happens?
      If the thread has managed to exit, it must not be SUSPEND_ALL or
      SUSPEND_THREAD. Otherwise the thread would be blocked in
      enqueueCommand() as described above. However, if we are in
      suspendWithInvokeEnabled() we know it must be SUSPEND_ALL or
      SUSPEND_THREAD because if you look in
      handleReportEventCompositeCommand() you'll see it does not call
      suspendWithInvokeEnabled() if the policy is SUSPEND_NONE.<br>
      <br>
      This all seems to indicate that we should never be at the point of
      calling suspendWithInvokeEnabled() when the thread passed to it
      has already exited. It should be blocked in enqueueCommand(). So
      we're missing something here in the understanding of the problem.
      There might be a bug elsewhere, and your proposed fix is just
      masking it.<br>
      <br>
      Chris<br>
      <br>
      On 5/24/19 2:01 PM, Chris Plummer wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:915b5ea3-745f-0bb3-158a-54504c242191@oracle.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">Hi Gary,<br>
        <br>
        On 5/24/19 5:17 AM, Gary Adams wrote:<br>
      </div>
      <blockquote type="cite" cite="mid:5CE7E0E8.3050901@oracle.com">
        <meta content="text/html; charset=utf-8"
          http-equiv="Content-Type">
        I have not tracked down the specific root cause of this failure,
        yet.<br>
        <br>
        It appears that the suspend is being attempted <b>before</b>
        the thread has been<br>
        recorded in <b>threadControl</b>.<br>
      </blockquote>
      I don't think this is possible. When the JVMTI event is received
      by the debug agent, that's when threadControl adds the thread. We
      are well beyond that point. The debug agent has processed the
      JVMTI event and created and queued a ReportEventCompositeCommand
      (recc) to pass the event on the the debugger (the queue is
      processed asynchronously by another thread). You are in the
      process of processing this recc, which involves suspending all
      threads before sending the event to the debugger. If the thread in
      the recc is invalid (no longer known to threadControl), then the
      only way I can see that happening is if the thread has terminated.
      ASFAIK, there is nothing preventing this from happening.<br>
      <blockquote type="cite" cite="mid:5CE7E0E8.3050901@oracle.com"> <br>
        Adding diagnostic messages is tricky because it changes the
        timing.<br>
        Here's a minimal probe to track threadControl addNode and
        clearThread<br>
        transactions. See attached log.txt.<br>
        <br>
        <tt>diff --git
          a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c
          b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c<br>
          --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c<br>
          +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c<br>
          @@ -491,7 +491,9 @@<br>
            static void<br>
            suspendWithInvokeEnabled(jbyte policy, jthread thread)<br>
            {<br>
          +   /*       if (threadControl_getInvokeRequest(thread) != NULL) {
          */<br>
                   invoker_enableInvokeRequests(thread);<br>
          +   /*       } */<br>
            <br>
                   if (policy == JDWP_SUSPEND_POLICY(ALL)) {<br>
                           (void)threadControl_suspendAll();<br>
          diff --git
          a/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c
          b/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c<br>
          --- a/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c<br>
          +++ b/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c<br>
          @@ -285,6 +285,7 @@<br>
            static void<br>
            addNode(ThreadList *list, ThreadNode *node)<br>
            {<br>
          +   printf ("addNode %p \n", node-&gt;thread);<br>
                   node-&gt;next = NULL;<br>
                   node-&gt;prev = NULL;<br>
                   node-&gt;list = NULL;<br>
          @@ -362,6 +363,7 @@<br>
            static void<br>
            clearThread(JNIEnv *env, ThreadNode *node)<br>
            {<br>
          +   printf("clearThread %p\n", node-&gt;thread);<br>
                   if (node-&gt;pendingStop != NULL) {<br>
                           tossGlobalRef(env, &amp;(node-&gt;pendingStop));<br>
                   }<br>
          @@ -1646,6 +1648,8 @@<br>
                   node = findThread(&amp;runningThreads, thread);<br>
                   if (node != NULL) {<br>
                             request = &amp;node-&gt;currentInvoke;<br>
          +       } else {<br>
          +           printf ("threadControl_getInvokeRequest %p\n", thread);<br>
                   }<br>
            <br>
                   debugMonitorExit(threadLock);</tt><br>
        <br>
      </blockquote>
      <blockquote type="cite" cite="mid:5CE7E0E8.3050901@oracle.com">
        The AGENT_ERROR_INVALID_THREAD is reported when
        invoker_enableInvokeRequest<br>
        does not find the thread. I added the print out in
        threadControl_getInvokeRequest<br>
        when the thread is not found.<br>
      </blockquote>
      You are printing the oop* instead of the oop. That's fine for the
      node-&gt;thread references in addNode and clearNode, since they
      should match up, but the "thread" reference in
      threadControl_getInvokeRequest() is probably a localref, so will
      not match up with anything printed by addNode or clearNode. You
      probably need to print the oop and hope there is no intervening
      gc.<br>
      <br>
      <blockquote type="cite" cite="mid:5CE7E0E8.3050901@oracle.com"> <br>
        The workaround bypasses the error and falls through to the
        threadControl CommonSuspend<br>
        path where runningThreads is complimented with an otherThreads
        mechanism to ensure<br>
        threads that are not between start and end events will be
        properly resumed later on.<br>
      </blockquote>
      This fix is probably fine, but I need to think about it some more.
      Would be best to first confirm what's going on though.<br>
      <br>
      thanks,<br>
      <br>
      Chris<br>
      <blockquote type="cite" cite="mid:5CE7E0E8.3050901@oracle.com"> <br>
        <br>
        On 5/23/19, 1:23 PM, Chris Plummer wrote:
        <blockquote
          cite="mid:5300d20e-dff9-7581-de49-275994ed7354@oracle.com"
          type="cite">Hi Gary, <br>
          <br>
          So a JVMTI event came in on a valid thread, got processed by
          the Debug Agent and enqueued to be sent to the Debugger.
          However, before it was actually sent, the thread became
          invalid. Am I understanding this issue correctly? <br>
          <br>
          thanks, <br>
          <br>
          Chris <br>
          <br>
          On 5/23/19 2:59 AM, <a class="moz-txt-link-abbreviated"
            href="mailto:gary.adams@oracle.com" moz-do-not-send="true">gary.adams@oracle.com</a>
          wrote: <br>
          <blockquote type="cite">This proposed workaround ensures that
            the delay between a suspend request <br>
            passing through the jdwp command queue will not fail due to
            a no longer <br>
            running thread. <br>
            <br>
               Webrev: <a class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/%7Egadams/8218701/webrev/"
              moz-do-not-send="true">http://cr.openjdk.java.net/~gadams/8218701/webrev/</a>
            <br>
               Issue: <a class="moz-txt-link-freetext"
              href="https://bugs.openjdk.java.net/browse/JDK-8218701"
              moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8218701</a>
            <br>
            <br>
          </blockquote>
          <br>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <p><br>
      </p>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic