'3-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    3-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)
From:       "serguei.spitsyn () oracle ! com" <serguei ! spitsyn () oracle ! com>
Date:       2014-10-31 21:07:00
Message-ID: 5453F9F4.20309 () oracle ! com
[Download RAW message or body]

It is 3-rd round of review for:
   https://bugs.openjdk.java.net/browse/JDK-6988950

New webrev:
http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/


Summary

   For failing scenario, please, refer to the 1-st round RFR below.

   I've found what is missed in the jdwp agent shutdown and decided to 
switch from a workaround to a real fix.

   The agent VM_DEATH callback sets the gdata field: gdata->vmDead = 1.
   The agent debugLoop_run() has a guard against the VM shutdown:

  165             } else if (gdata->vmDead &&
  166              ((cmd->cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) {
  167                 /* Protect the VM from calls while dead.
  168                  * VirtualMachine cmdSet quietly ignores some cmds
  169                  * after VM death, so, it sends it's own errors.
  170                  */
  171                 outStream_setError(&out, JDWP_ERROR(VM_DEAD));


   However, the guard above does not help much if the VM_DEATH event 
happens in the middle of a command execution.
   There is a lack of synchronization here.

   The fix introduces new lock (vmDeathLock) which does not allow to 
execute the commands
   and the VM_DEATH event callback concurrently.
   It should work well for any function that is used in implementation 
of the JDWP_COMMAND_SET(VirtualMachine) .


Testing:
   Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests


Thanks,
Serguei


On 10/29/14 6:05 PM, serguei.spitsyn@oracle.com wrote:
> The updated webrev:
> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/ 
>
>
> The changes are:
>   - added a comment recommended by Staffan
>   - removed the ignore_wrong_phase() call from function classSignature()
>
> The classSignature() function is called in 16 places.
> Most of them do not tolerate the NULL in place of returned signature 
> and will crash.
> I'm not comfortable to fix all the occurrences now and suggest to 
> return to this
> issue after gaining experience with more failure cases that are still 
> expected.
> The failure with the classSignature() involved was observed only once 
> in the nightly
> and should be extremely rare reproducible.
> I'll file a placeholder bug if necessary.
>
> Thanks,
> Serguei
>
> On 10/28/14 6:11 PM, serguei.spitsyn@oracle.com wrote:
>> Please, review the fix for:
>>   https://bugs.openjdk.java.net/browse/JDK-6988950
>>
>>
>> Open webrev:
>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/ 
>>
>>
>>
>> Summary:
>>
>>    The failing scenario:
>>      The debugger and the debuggee are well aware a VM shutdown has 
>> been started in the target process.
>>      The debugger at this point is not expected to send any commands 
>> to the JDWP agent.
>>      However, the JDI layer (debugger side) and the jdwp agent 
>> (debuggee side)
>>      are not in sync with the consumer layers.
>>
>>      One reason is because the test debugger does not invoke the JDI 
>> method VirtualMachine.dispose().
>>      Another reason is that the Debugger and the debuggee processes 
>> are uneasy to sync in general.
>>
>>      As a result the following steps are possible:
>>        - The test debugger sends a 'quit' command to the test debuggee
>>        - The debuggee is normally exiting
>>        - The jdwp backend reports (over the jdwp protocol) an 
>> anonymous class unload event
>>        - The JDI InternalEventHandler thread handles the 
>> ClassUnloadEvent event
>>        - The InternalEventHandler wants to uncache the matching 
>> reference type.
>>          If there is more than one class with the same host class 
>> signature, it can't distinguish them,
>>          and so, deletes all references and re-retrieves them again 
>> (see tracing below):
>>            MY_TRACE: JDI: 
>> VirtualMachineImpl.retrieveClassesBySignature: 
>> sig=Ljava/lang/invoke/LambdaForm$DMH;
>>        - The jdwp backend debugLoop_run() gets the command from JDI 
>> and calls the functions
>>          classesForSignature() and classStatus() recursively.
>>        - The classStatus() makes a call to the JVMTI GetClassStatus() 
>> and gets the JVMTI_ERROR_WRONG_PHASE
>>        - As a result the jdwp backend reports the JVMTI error to the 
>> JDI, and so, the test fails
>>
>>      For details, see the analysis in bug report closed as a dup of 
>> the bug 6988950:
>>         https://bugs.openjdk.java.net/browse/JDK-8024865
>>
>>      Some similar cases can be found in the two bug reports (6988950 
>> and 8024865) describing this issue.
>>
>>      The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE error 
>> as it is normal at the VM shutdown.
>>      The original jdwp backend implementation had a similar approach 
>> for the raw monitor functions.
>>      Threy use the ignore_vm_death() to workaround the 
>> JVMTI_ERROR_WRONG_PHASE errors.
>>      For reference, please, see the file: src/share/back/util.c
>>
>>
>> Testing:
>>   Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>>
>>
>> Thanks,
>> Serguei
>>
>


[Attachment #3 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix"><br>
      It is 3-rd round of review for:<br>
         <a class="moz-txt-link-freetext" \
href="https://bugs.openjdk.java.net/browse/JDK-6988950">https://bugs.openjdk.java.net/browse/JDK-6988950</a>
  <br>
      <br>
      New webrev:<br>
        
<a class="moz-txt-link-freetext" \
href="http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3 \
/">http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/</a><br>
  <br>
      <br>
      Summary<br>
      <br>
         For failing scenario, please, refer to the 1-st round RFR below.<br>
      <br>
         I've found what is missed in the jdwp agent shutdown and decided
      to switch from a workaround to a real fix.<br>
      <br>
         The agent VM_DEATH callback sets the gdata field:
      gdata-&gt;vmDead = 1.<br>
         The agent debugLoop_run()
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      has a guard against the VM shutdown:<br>
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <pre> 165             } else if (gdata-&gt;vmDead &amp;&amp;
 166              ((cmd-&gt;cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) {
 167                 /* Protect the VM from calls while dead.
 168                  * VirtualMachine cmdSet quietly ignores some cmds
 169                  * after VM death, so, it sends it's own errors.
 170                  */
 171                 outStream_setError(&amp;out, JDWP_ERROR(VM_DEAD));</pre>
      <br>
         However, the guard above does not help much if the VM_DEATH
      event happens in the middle of a command execution. <br>
         There is a lack of synchronization here.<br>
      <br>
         The fix introduces new lock (vmDeathLock) which does not allow
      to execute the commands<br>
         and the VM_DEATH event callback concurrently.<br>
         It should work well for any function that is used in
      implementation of the JDWP_COMMAND_SET(VirtualMachine)
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      .<br>
      <br>
      <br>
      Testing:
      <br>
         Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi
      tests
      <br>
      <br>
      <br>
      Thanks,<br>
      Serguei<br>
      <br>
      <br>
      On 10/29/14 6:05 PM, <a class="moz-txt-link-abbreviated" \
href="mailto:serguei.spitsyn@oracle.com">serguei.spitsyn@oracle.com</a> wrote:<br>  \
</div>  <blockquote cite="mid:54518EE1.9040208@oracle.com" type="cite">The
      updated webrev:
      <br>
<a class="moz-txt-link-freetext" \
href="http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2 \
/">http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/</a>
  <br>
      <br>
      The changes are:
      <br>
         - added a comment recommended by Staffan
      <br>
         - removed the ignore_wrong_phase() call from function
      classSignature()
      <br>
      <br>
      The classSignature() function is called in 16 places.
      <br>
      Most of them do not tolerate the NULL in place of returned
      signature and will crash.
      <br>
      I'm not comfortable to fix all the occurrences now and suggest to
      return to this
      <br>
      issue after gaining experience with more failure cases that are
      still expected.
      <br>
      The failure with the classSignature() involved was observed only
      once in the nightly
      <br>
      and should be extremely rare reproducible.
      <br>
      I'll file a placeholder bug if necessary.
      <br>
      <br>
      Thanks,
      <br>
      Serguei
      <br>
      <br>
      On 10/28/14 6:11 PM, <a class="moz-txt-link-abbreviated" \
href="mailto:serguei.spitsyn@oracle.com">serguei.spitsyn@oracle.com</a> wrote:  <br>
      <blockquote type="cite">Please, review the fix for:
        <br>
           <a class="moz-txt-link-freetext" \
href="https://bugs.openjdk.java.net/browse/JDK-6988950">https://bugs.openjdk.java.net/browse/JDK-6988950</a>
  <br>
        <br>
        <br>
        Open webrev:
        <br>
        <a class="moz-txt-link-freetext" \
href="http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1 \
/">http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/</a>
  <br>
        <br>
        <br>
        Summary:
        <br>
        <br>
             The failing scenario:
        <br>
                 The debugger and the debuggee are well aware a VM shutdown
        has been started in the target process.
        <br>
                 The debugger at this point is not expected to send any
        commands to the JDWP agent.
        <br>
                 However, the JDI layer (debugger side) and the jdwp agent
        (debuggee side)
        <br>
                 are not in sync with the consumer layers.
        <br>
        <br>
                 One reason is because the test debugger does not invoke the
        JDI method VirtualMachine.dispose().
        <br>
                 Another reason is that the Debugger and the debuggee
        processes are uneasy to sync in general.
        <br>
        <br>
                 As a result the following steps are possible:
        <br>
                     - The test debugger sends a 'quit' command to the test
        debuggee
        <br>
                     - The debuggee is normally exiting
        <br>
                     - The jdwp backend reports (over the jdwp protocol) an
        anonymous class unload event
        <br>
                     - The JDI InternalEventHandler thread handles the
        ClassUnloadEvent event
        <br>
                     - The InternalEventHandler wants to uncache the matching
        reference type.
        <br>
                         If there is more than one class with the same host
        class signature, it can't distinguish them,
        <br>
                         and so, deletes all references and re-retrieves them
        again (see tracing below):
        <br>
                             MY_TRACE: JDI:
        VirtualMachineImpl.retrieveClassesBySignature:
        sig=Ljava/lang/invoke/LambdaForm$DMH;
        <br>
                     - The jdwp backend debugLoop_run() gets the command from
        JDI and calls the functions
        <br>
                         classesForSignature() and classStatus() recursively.
        <br>
                     - The classStatus() makes a call to the JVMTI
        GetClassStatus() and gets the JVMTI_ERROR_WRONG_PHASE
        <br>
                     - As a result the jdwp backend reports the JVMTI error to
        the JDI, and so, the test fails
        <br>
        <br>
                 For details, see the analysis in bug report closed as a dup
        of the bug 6988950:
        <br>
                       <a class="moz-txt-link-freetext" \
href="https://bugs.openjdk.java.net/browse/JDK-8024865">https://bugs.openjdk.java.net/browse/JDK-8024865</a>
  <br>
        <br>
                 Some similar cases can be found in the two bug reports
        (6988950 and 8024865) describing this issue.
        <br>
        <br>
                 The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE
        error as it is normal at the VM shutdown.
        <br>
                 The original jdwp backend implementation had a similar
        approach for the raw monitor functions.
        <br>
                 Threy use the ignore_vm_death() to workaround the
        JVMTI_ERROR_WRONG_PHASE errors.
        <br>
                 For reference, please, see the file: src/share/back/util.c
        <br>
        <br>
        <br>
        Testing:
        <br>
           Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi
        tests
        <br>
        <br>
        <br>
        Thanks,
        <br>
        Serguei
        <br>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic