[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: Codereview request: 8050115 javax/management/monitor/GaugeMonitorDeadlockTest.java
From:       shanliang <shanliang.jiang () oracle ! com>
Date:       2014-09-18 14:27:30
Message-ID: 541AEBD2.7090307 () oracle ! com
[Download RAW message or body]

David Holmes wrote:
> Still not 100% sure about the deadlock detection logic here, but as 
> long as it does no harm - ok.
>
> Minor style nit:
>
>   59         System.out.println("=== checkingTime = "+checkingTime+"ms");
>
> spaces needed around the + operator.
OK, I have added spaces.

Thanks David for the review.
Shanliang
>
> Thanks,
> David
>
> On 18/09/2014 1:09 AM, shanliang wrote:
>> Daniel Fuchs wrote:
>>> On 9/17/14 4:43 PM, shanliang wrote:
>>>> Daniel,
>>>>
>>>> We could not be sure that the test failed of timeout, that's why I 
>>>> tried
>>>> to add more checks.
>>>>
>>>> The check for Step 1: all thread traces were printed out only if
>>>> deadlock was found, and the test failed immediately.
>>>> The check for Step 2:
>>>>     1) all thread traces were printed out only if the tested thread 
>>>> was
>>>> blocked, but the test did not fail because we were not sure if 
>>>> deadlock
>>>> happened, but the info might be helpful;
>>>>     2) otherwise only the trace of the tested thread was printed out.
>>>>
>>>> In case that the test gets interrupted again by the test harness, hope
>>>> we can have some useful info from these 2 checks.
>>>>
>>>> It must not be so heavy but still could impact the test, your 
>>>> suggestion
>>>> to use test.timeout.factor is a good idea, I added the code to 
>>>> calculate
>>>> the checking time based on it:
>>>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/03/
>>>
>>> I Shanliang,
>>>
>>> This looks much better, thanks.
>>> May I suggested taking the current time again at lines 125
>>> and 179:
>>>
>>>    checkedTime = System.currentTimeMillis();
>>>
>>> It would allow to discount the time spent in checking.
>> Here is the new version with your suggestion to re-calculate 
>> checkedTime.
>>
>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/04/
>>
>> Thanks a lot for your time!
>> Shanliang
>>>
>>> best regards,
>>>
>>> -- daniel
>>>
>>>>
>>>> Thanks,
>>>> Shanliang
>>>>
>>>> Daniel Fuchs wrote:
>>>>> Hi Shanliang,
>>>>>
>>>>> On 9/17/14 2:19 PM, shanliang wrote:
>>>>>> Daniel,
>>>>>>
>>>>>> The test does 2 steps of verifications, the new check is useful for
>>>>>> the
>>>>>> first step, and the trace in the bug showed that the test failed on
>>>>>> the
>>>>>> first step.
>>>>>>
>>>>>> Yes the check might not work for the second step, I added the new 
>>>>>> code
>>>>>> for the second step to check the tested thread state and hope to 
>>>>>> have
>>>>>> useful info if the test failed on the second step.
>>>>>>
>>>>>> Here is the new version:
>>>>>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/
>>>>>>
>>>>>> Thanks,
>>>>>> Shanliang
>>>>>
>>>>> If I understand the issue correctly - the test fails in timeout
>>>>> mostly on very slow machines/configurations (fastdebug with some
>>>>> combinations of options).
>>>>>
>>>>> I worry that printing a thread dump every seconds (1000ms) is going
>>>>> to make things worse: the test will spend its time printing thread
>>>>> dumps instead of doing what it is supposed to do - and will have
>>>>> even less CPU cycles to execute its 'real' code.
>>>>>
>>>>> I would have advised printing the thread dumps only at the end,
>>>>> when it is detected that there might be a deadlock - except that
>>>>> now we can't do that since the timeout is managed completely
>>>>> by the harness (so we don't get the upper hand at the end in
>>>>> case of timeout).
>>>>>
>>>>> I think depending on the harness to set the appropriate timeout
>>>>> rather than depending on an arbitrary timeout set in the test itself
>>>>> is the right thing to do. It's been a pattern in many tests that
>>>>> failed in timeout intermittently on some slow machines/configuration.
>>>>>
>>>>> In any case - 1s seems really too frequent.
>>>>> I suppose you could inspect the system properties set by the harness
>>>>> (timeout + timeout factor) to devise an acceptable frequency for
>>>>> your checks - if you really want to print this info.
>>>>>
>>>>> From the log I see that the timeout factor passed to the harness
>>>>> for the slow configuration that failed is
>>>>> -Dtest.timeout.factor=8.0
>>>>> There's no explicit timeout given - and jtreg -onlineHelp reveals
>>>>> that in this case the default timeout is two minutes.
>>>>>
>>>>> This means that the harness has allocated 2*8=16mins for the test to
>>>>> execute.
>>>>> I don't think you want to take the risk of printing a thread dump
>>>>> every seconds during 16 minutes ;-)
>>>>>
>>>>> Of course I'm over simplifying here. Before your changes - the test
>>>>> was deciding after 46.893 seconds that there must be a deadlock.
>>>>> 47s is obviously way too short for a possibly slow machine running
>>>>> the test in fastdebug mode.
>>>>>
>>>>> Something like the following might be more reasonable:
>>>>>
>>>>> // default timeout factor is 1.0
>>>>> double factor =
>>>>>     Double.parseDouble(
>>>>>        System.getProperty("test.timeout.factor", "1.0"));
>>>>> // default timeout is 2mins = 120s.
>>>>> double timeout = Double.parseDouble(
>>>>>        System.getProperty("test.timeout", "120"));
>>>>>
>>>>> // total time is timeout * timeout factor * 1000 ms
>>>>> long total = (long) factor * timeout * 1000;
>>>>>
>>>>> // Don't print thread dumps too often.
>>>>> // every 5s for a total timeout of 120s seems reasonable.
>>>>> // 120s/5s = 24; we will lengthen the delay if the total
>>>>> // timeout is greater than 120s, so we're taking the max between
>>>>> // 5s and total/24
>>>>> long delayBetweenThreadDumps = Math.max(5000, total/24);
>>>>>
>>>>> Of course 5s and total/24 are just arbitrary...
>>>>> But 24 full thread dumps in a log for a single test is enough data
>>>>> to analyze I think ;-)
>>>>>
>>>>> best regards,
>>>>>
>>>>> -- daniel
>>>>>
>>>>>>
>>>>>>
>>>>>> Daniel Fuchs wrote:
>>>>>>> On 9/17/14 10:55 AM, shanliang wrote:
>>>>>>>> David Holmes wrote:
>>>>>>>>> On 17/09/2014 7:01 AM, shanliang wrote:
>>>>>>>>>> David Holmes wrote:
>>>>>>>>>>> Hi Shanliang,
>>>>>>>>>>>
>>>>>>>>>>> On 16/09/2014 7:12 PM, shanliang wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Please review the following fix:
>>>>>>>>>>>
>>>>>>>>>>> I don't see any functional change. You seem to have replaced a
>>>>>>>>>>> built-in timeout with the externally applied test harness
>>>>>>>>>>> timeout.
>>>>>>>>>> Yes no functional change here, we thought that the test needed
>>>>>>>>>> more
>>>>>>>>>> time
>>>>>>>>>> to wait a change if a testing VM or machine was really slow, the
>>>>>>>>>> test
>>>>>>>>>> harness timeout was the maximum time we could give the test.
>>>>>>>>>
>>>>>>>>> Do we have confidence that the harness timeout is sufficient to
>>>>>>>>> handle
>>>>>>>>> the intermittent failures?
>>>>>>>> Really a good question :)
>>>>>>>>
>>>>>>>> Here is new version:
>>>>>>>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/
>>>>>>>>
>>>>>>>> I added a deadlocked check in every 1 second, hope to get more
>>>>>>>> info in
>>>>>>>> case of failure.
>>>>>>>
>>>>>>> The following comment seems to imply that this check is not
>>>>>>> very useful:
>>>>>>>
>>>>>>>  112             // This won't show up as a deadlock in CTRL-\ 
>>>>>>> or in
>>>>>>>  113             // ThreadMXBean.findDeadlockedThreads(), because
>>>>>>> they
>>>>>>> don't
>>>>>>>  114             // see that thread A is waiting for thread B
>>>>>>> (B.join()), and
>>>>>>>  115             // thread B is waiting for a lock held by thread A
>>>>>>>
>>>>>>> best regards,
>>>>>>>
>>>>>>> -- daniel
>>>>>>>
>>>>>>>>
>>>>>>>> I changed also the sleep time to 100ms, 10ms seems too short as
>>>>>>>> Daniel
>>>>>>>> pointed out.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Style nit: add a space after 'while' -> while (cond) {
>>>>>>>>>> OK, I will do it before pushing.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Shanliang
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8050115
>>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic