[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: 2-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)
From:       "serguei.spitsyn () oracle ! com" <serguei ! spitsyn () oracle ! com>
Date:       2014-10-30 20:35:27
Message-ID: 5452A10F.2080104 () oracle ! com
[Download RAW message or body]

As we started this discussion I've added the open mailing lists back. :)

On 10/30/14 12:42 PM, Daniel D. Daugherty wrote:
> On 10/30/14 1:06 PM, serguei.spitsyn@oracle.com wrote:
>> Staffan and Dan,
>>
>> Do you have anything to say?
>
> It feels like we're suppressing a symptom rather than dealing with
> the underlying cause. However, I haven't been close to this bug
> for years so my memories are rusty here.

I agree.
It feels that not all the aspects of the shutdown sequence were equally 
designed in the JDI + jdwp agent.
There can be shutdown races between the debugger + JDI and debuggee + 
jdwp agent.
I do not see patterns in the code to recognize the shutdown and bail out 
gracefully.

>
> One question:
>
> line 1051: if (debugInit_isInitComplete() && error == 
> JVMTI_ERROR_WRONG_PHASE) {
>     The debugInit_isInitComplete() check means that we only do
>     this suppression in the live phase or later, right? Perhaps
>     we should do this only when we are post live phase...

Agreed.
It is exactly the case.
In normal case the debugInit_isInitComplete() returns true after VM_INIT 
event was received.
Some agent flag can enforce to postpone the agent initialization until 
an Exception event is received.
In all cases, the initialization happens in the live phase or later (not 
sure, it can happen in the dead phase).
Encountering the JVMTI WRONG_PHASE error means the VM entered the 
VM_DEAD phase.
This must be a signal to start an agent shutdown.
At this point, I'm not ready to redesign this in the agent.
This fix is only a workaround for nightly stabilization.

>
>     Maybe a flag set at the beginning of the VMDeath event handler
>     would be better.


There is already a global flag: gdata->vmDead
I've already tried to use it, but it did not work for me.
Let me check it more.


>
>
>> Is it Ok to push this?
>
> I'm OK with it, but I'm just one voice...

Thanks!
You raised good points.

>
>
>> Dan, should I count on you as a reviewer?
>
> Yes, I've reviewed the changes at this point.

Ok.

>
>
>> I will also need to backport this to 8u40.
>
> You might want to let this bake for a couple of weeks first...

Sure.


Thanks,
Serguei

>
> Dan
>
>
>>
>> Thanks!
>> Serguei
>>
>> On 10/30/14 4:16 AM, Dmitry Samersoff wrote:
>>> Serguei,
>>>
>>> Looks good for me!
>>>
>>> -Dmitry
>>>
>>> On 2014-10-30 04:05, serguei.spitsyn@oracle.com wrote:
>>>> The updated webrev:
>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/ 
>>>>
>>>>
>>>>
>>>> The changes are:
>>>>    - added a comment recommended by Staffan
>>>>    - removed the ignore_wrong_phase() call from function 
>>>> classSignature()
>>>>
>>>> The classSignature() function is called in 16 places.
>>>> Most of them do not tolerate the NULL in place of returned 
>>>> signature and
>>>> will crash.
>>>> I'm not comfortable to fix all the occurrences now and suggest to 
>>>> return
>>>> to this
>>>> issue after gaining experience with more failure cases that are still
>>>> expected.
>>>> The failure with the classSignature() involved was observed only 
>>>> once in
>>>> the nightly
>>>> and should be extremely rare reproducible.
>>>> I'll file a placeholder bug if necessary.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>> On 10/28/14 6:11 PM, serguei.spitsyn@oracle.com wrote:
>>>>> Please, review the fix for:
>>>>>    https://bugs.openjdk.java.net/browse/JDK-6988950
>>>>>
>>>>>
>>>>> Open webrev:
>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/ 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>>     The failing scenario:
>>>>>       The debugger and the debuggee are well aware a VM shutdown has
>>>>> been started in the target process.
>>>>>       The debugger at this point is not expected to send any commands
>>>>> to the JDWP agent.
>>>>>       However, the JDI layer (debugger side) and the jdwp agent
>>>>> (debuggee side)
>>>>>       are not in sync with the consumer layers.
>>>>>
>>>>>       One reason is because the test debugger does not invoke the JDI
>>>>> method VirtualMachine.dispose().
>>>>>       Another reason is that the Debugger and the debuggee processes
>>>>> are uneasy to sync in general.
>>>>>
>>>>>       As a result the following steps are possible:
>>>>>         - The test debugger sends a 'quit' command to the test 
>>>>> debuggee
>>>>>         - The debuggee is normally exiting
>>>>>         - The jdwp backend reports (over the jdwp protocol) an
>>>>> anonymous class unload event
>>>>>         - The JDI InternalEventHandler thread handles the
>>>>> ClassUnloadEvent event
>>>>>         - The InternalEventHandler wants to uncache the matching
>>>>> reference type.
>>>>>           If there is more than one class with the same host class
>>>>> signature, it can't distinguish them,
>>>>>           and so, deletes all references and re-retrieves them again
>>>>> (see tracing below):
>>>>>             MY_TRACE: JDI:
>>>>> VirtualMachineImpl.retrieveClassesBySignature:
>>>>> sig=Ljava/lang/invoke/LambdaForm$DMH;
>>>>>         - The jdwp backend debugLoop_run() gets the command from JDI
>>>>> and calls the functions
>>>>>           classesForSignature() and classStatus() recursively.
>>>>>         - The classStatus() makes a call to the JVMTI 
>>>>> GetClassStatus()
>>>>> and gets the JVMTI_ERROR_WRONG_PHASE
>>>>>         - As a result the jdwp backend reports the JVMTI error to the
>>>>> JDI, and so, the test fails
>>>>>
>>>>>       For details, see the analysis in bug report closed as a dup of
>>>>> the bug 6988950:
>>>>>          https://bugs.openjdk.java.net/browse/JDK-8024865
>>>>>
>>>>>       Some similar cases can be found in the two bug reports (6988950
>>>>> and 8024865) describing this issue.
>>>>>
>>>>>       The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE 
>>>>> error as
>>>>> it is normal at the VM shutdown.
>>>>>       The original jdwp backend implementation had a similar approach
>>>>> for the raw monitor functions.
>>>>>       Threy use the ignore_vm_death() to workaround the
>>>>> JVMTI_ERROR_WRONG_PHASE errors.
>>>>>       For reference, please, see the file: src/share/back/util.c
>>>>>
>>>>>
>>>>> Testing:
>>>>>    Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi 
>>>>> tests
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>
>>
>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic