[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
From:       "serguei.spitsyn () oracle ! com" <serguei ! spitsyn () oracle ! com>
Date:       2018-08-22 17:54:05
Message-ID: 91a48cb9-b3ab-a581-3c8e-8bb21d8223db () oracle ! com
[Download RAW message or body]

Hi Gary,

It looks good.

Thanks,
Serguei


On 8/22/18 09:54, Gary Adams wrote:
> Updated webrev:
>   http://cr.openjdk.java.net/~gadams/8034084/webrev.01/index.html
>
> On 8/15/18, 8:35 PM, David Holmes wrote:
>> +1
>>
>> Thanks,
>> David
>>
>> On 15/08/2018 8:41 AM, Alex Menkov wrote:
>>> Hi Gary,
>>>
>>> I think check for RawMonitorWait error should be moved into the loop
>>> or exit loop condition can be updated to
>>>
>>> 243     if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
>>> 244         break;
>>> 245     }
>>>
>>> --alex
>>>
>>> On 08/14/2018 10:03, Gary Adams wrote:
>>>> I did 1000 testruns with these changes. There were 3 failures due 
>>>> to "executor timeout"
>>>> but those are not related to these changes (infrastructure).
>>>>
>>>>    Webrev: 
>>>> http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>>>>    Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>>>>
>>>> On 8/12/18, 7:01 PM, David Holmes wrote:
>>>>> Hi Gary,
>>>>>
>>>>> Good analysis. I agree with your reasoning.
>>>>>
>>>>> I think these tests were written in a way that assumed that we 
>>>>> have to locally timeout to detect things not happening "quickly 
>>>>> enough" but as we well know "quickly enough" is highly dependent 
>>>>> on a number of factors. And now we rely on the test framework to 
>>>>> manage timeouts.
>>>>>
>>>>> I think removing the timeout from the first wait() and simply 
>>>>> allowing the framework timeouts to take effect is a good solution.
>>>>>
>>>>> Repeating the wait in the second case with twice the timeout is 
>>>>> less clear. One second wasn't long enough, but perhaps 3 seconds 
>>>>> will be. And it probably will but is not guaranteed. The real 
>>>>> guarantee is to wait until you see the expected outcome (either by 
>>>>> polling using the timed-wait, or else by inserting a notify 
>>>>> elsewhere - which itself may be tricky). And once you start 
>>>>> pushing on the wait/notify logic the whole structure of the test 
>>>>> potentially changes as you either wait until you get the expected 
>>>>> result, or else get timed out by the framework.
>>>>>
>>>>> I think waiting a bit longer will probably suffice and cause 
>>>>> minimum disruption to the test logic, but I'd be included to put 
>>>>> the wait into a for-loop and use 1-second timeouts per iteration ie.
>>>>>
>>>>> // wait for up to 3 seconds
>>>>> for (int i = 0 ; i < 3 ; i++) {
>>>>>   err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>>>>   if (endsCount == endsExpected)
>>>>>     break;
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>>>>> There's a fairly old bug concerning jvmti thread start and end 
>>>>>> events.
>>>>>> I believe there are 2 possible race conditions in the threadstart003
>>>>>> test based on some arbitrary timeout values used in the test itself.
>>>>>>
>>>>>> Here are a few snippets from the test to illustrate the problem 
>>>>>> area.
>>>>>>
>>>>>> In the initial sequence RunAgentThread is used to start a thread
>>>>>> with the agentThreadProc entry point.
>>>>>>
>>>>>>     188        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>>     189        if (err != JVMTI_ERROR_NONE) {
>>>>>>     190            printf("(RawMonitorEnter) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     191                   TranslateError(err), err);
>>>>>>     192            result = FAILED;
>>>>>>     193        }
>>>>>>
>>>>>> *   194        err = (*jvmti)->RunAgentThread(jvmti, thr, 
>>>>>> agentThreadProc,*
>>>>>>     195            NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>>>>     196        if (err != JVMTI_ERROR_NONE) {
>>>>>>     197            printf("(RunAgentThread) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     198                   TranslateError(err), err);
>>>>>>     199            result = FAILED;
>>>>>>     200        }
>>>>>>
>>>>>> *   201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>>>> (jlong)WAIT_TIME);*
>>>>>>     202        if (err != JVMTI_ERROR_NONE) {
>>>>>>     203            printf("(RawMonitorWait) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     204                   TranslateError(err), err);
>>>>>>     205            result = FAILED;
>>>>>>     206        }
>>>>>>     207        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>>     208        if (err != JVMTI_ERROR_NONE) {
>>>>>>     209            printf("(RawMonitorExit) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     210                   TranslateError(err), err);
>>>>>>     211            result = FAILED;
>>>>>>     212        }
>>>>>>
>>>>>> I believe the test expects to wake up based on the code in 
>>>>>> agentThreadProc
>>>>>> notification of the wait_lock.
>>>>>>
>>>>>>     121        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>>     122        if (err != JVMTI_ERROR_NONE) {
>>>>>>     123            printf("(RawMonitorEnter) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     124                   TranslateError(err), err);
>>>>>>     125            result = FAILED;
>>>>>>     126        }
>>>>>> *   127        err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>>>>     128        if (err != JVMTI_ERROR_NONE) {
>>>>>>     129            printf("(RawMonitorNotify) unexpected error: 
>>>>>> %s (%d)\n",
>>>>>>     130                   TranslateError(err), err);
>>>>>>     131            result = FAILED;
>>>>>>     132        }
>>>>>>     133        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>>     134        if (err != JVMTI_ERROR_NONE) {
>>>>>>     135            printf("(RawMonitorExit) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     136                   TranslateError(err), err);
>>>>>>     137            result = FAILED;
>>>>>>     138        }
>>>>>>
>>>>>> A second block for the wait_lock does not have a corresponding
>>>>>> notify call. I'm guessing the monitor is expected to time out,
>>>>>> so it allows for one second to complete the delivery of the
>>>>>> thread end event.
>>>>>>
>>>>>>    213
>>>>>>     214        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>>     215        if (err != JVMTI_ERROR_NONE) {
>>>>>>     216            printf("(RawMonitorEnter) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     217                   TranslateError(err), err);
>>>>>>     218            result = FAILED;
>>>>>>     219        }
>>>>>> *   220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>>>> (jlong)WAIT_TIME);*
>>>>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>>>>     222            printf("(RawMonitorWait) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     223                   TranslateError(err), err);
>>>>>>     224            result = FAILED;
>>>>>>     225        }
>>>>>>     226        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>>     227        if (err != JVMTI_ERROR_NONE) {
>>>>>>     228            printf("(RawMonitorExit) unexpected error: %s 
>>>>>> (%d)\n",
>>>>>>     229                   TranslateError(err), err);
>>>>>>     230            result = FAILED;
>>>>>>     231        }
>>>>>>
>>>>>> I see 2 possible race conditions here.
>>>>>>
>>>>>>    - if the thread is slow to start, then the first RawMonitorWait
>>>>>>      could timeout. The code would continue to the second 
>>>>>> RawMonitorWait
>>>>>>      which would wake up from the agentThreadProc call to 
>>>>>> RawMonitorNotify.
>>>>>>
>>>>>>    - if the thread ending is slow to complete then the second 
>>>>>> RawMonitorWait
>>>>>>      could timeout before the thread end event is processed.
>>>>>>
>>>>>> After the second RawMonitorWait completes, the event notification 
>>>>>> is disabled
>>>>>> for both thread start and end events. If we disable the end 
>>>>>> thread event before
>>>>>> the thread end event is delivered the counts will not match as 
>>>>>> expected.
>>>>>>
>>>>>>     233        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>>>>> JVMTI_DISABLE,
>>>>>>     234                JVMTI_EVENT_THREAD_START, NULL);
>>>>>>     235        if (err != JVMTI_ERROR_NONE) {
>>>>>>     236            printf("Failed to disable 
>>>>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>>>>     237                   TranslateError(err), err);
>>>>>>     238            result = FAILED;
>>>>>>     239        }
>>>>>>     240 *
>>>>>>     241        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>>>>> JVMTI_DISABLE,
>>>>>>     242                JVMTI_EVENT_THREAD_END, NULL);*
>>>>>>     243        if (err != JVMTI_ERROR_NONE) {
>>>>>>     244            printf("Failed to disable 
>>>>>> JVMTI_EVENT_THREAD_END: %s (%d)\n",
>>>>>>     245                   TranslateError(err), err);
>>>>>>     246            result = FAILED;
>>>>>>     247        }
>>>>>>
>>>>>> My recommendation is to change the first RawMonitorWait to use 0
>>>>>> rather than WAIT_TIME. This means it will wait until notified.
>>>>>> For the second RawMonitorWait, we should retry the wait if
>>>>>> endsCount does not equal endsExpected.
>>>>>>
>>>>>>     201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>>>>
>>>>>>
>>>>>>     220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>>>> (jlong)WAIT_TIME);
>>>>>>                if (endsCount != endsExpected) {
>>>>>>                    err = (*jvmti)->RawMonitorWait(jvmti, 
>>>>>> wait_lock, (jlong)WAIT_TIME * 2);
>>>>>>                }
>>>>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>>>>
>>>>>> So we remove the first race condition by removing the timeout on 
>>>>>> the first wait call.
>>>>>> And we accommodate a possible second race condition by retrying 
>>>>>> the second wait with
>>>>>> an extended delay.
>>>>>>
>>>>>> An alternative to the retry could be to add a RawMonitorNotify 
>>>>>> call in the thread
>>>>>> end event callback ThreadEnd() after the endsCount has been 
>>>>>> incremented.
>>>>>>
>>>>>> This test fails very infrequently, so it may be difficult to 
>>>>>> prove that the specific
>>>>>> problem has been resolved. I'll try some testing with the 
>>>>>> proposed changes
>>>>>> and see if we can work towards getting this test off the 
>>>>>> ProblemList for jdk12.
>>>>>>
>>>>>>
>>>>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic