[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR: JDK-4903717: nsk/jdi/ThreadReference/isSuspended/issuspended002 failing
From:       Alex Menkov <alexey.menkov () oracle ! com>
Date:       2019-02-28 21:44:26
Message-ID: 4956c393-fe9e-1d76-5b03-e0a69cc00750 () oracle ! com
[Download RAW message or body]

Hi Gary,

Now "mainThread is out of: synchronized (lockingObject) {" record is 
logged at the end of the scenario and that makes it useless.

I think it would be better to call
pipe.println("docontinue") after logging of this message:

   log1("mainThread is out of: synchronized (lockingObject) {");
+  pipe.println("docontinue");
+  // wait until the scenario is completed to avoid deadlocks on logging
+  synchronized (Threadissuspended002a.waitnotifyObj) {
+    Threadissuspended002a.waitnotifyObj.wait();
+  }

--alex

On 02/27/2019 02:29, gary.adams@oracle.com wrote:
> Revised webrev : http://cr.openjdk.java.net/~gadams/4903717/webrev.02/
> 
> To prevent the main thread from holding the logging lock when
> the debugger suspends the debuggee, an explicit wait/notify is used
> so the test thread can safely issue log messages while the main thread
> remains suspended.
> 
> Testing is in progress.
> 
> On 2/26/19 3:49 PM, gary.adams@oracle.com wrote:
> > New fix will be coming tomorrow...
> > 
> > I dumped the main and thread2 stacks at the
> > point of the timeout and found they were
> > both contending for the java.io.PrintStream
> > writeln() synchronization block.
> > 
> > While writing log messages main thread held the
> > lock and was suspended and thread2 was blocked
> > trying to report it's progress. So thread2 never reaches the
> > breakpoint.
> > 
> > java.nio.Buffer:292 in thread instance of java.lang.Thread(name='main', id=1)
> > java.nio.CharBuffer:1260 in thread instance of java.lang.Thread(name='main', \
> > id=1) java.nio.CharBuffer:265 in thread instance of java.lang.Thread(name='main', \
> > id=1) java.nio.Buffer:223 in thread instance of java.lang.Thread(name='main', \
> > id=1) java.nio.CharBuffer:284 in thread instance of java.lang.Thread(name='main', \
> > id=1) java.nio.HeapCharBuffer:77 in thread instance of \
> > java.lang.Thread(name='main', id=1) java.nio.CharBuffer:396 in thread instance of \
> > java.lang.Thread(name='main', id=1) sun.nio.cs.StreamEncoder:280 in thread \
> > instance of java.lang.Thread(name='main', id=1) sun.nio.cs.StreamEncoder:125 in \
> > thread instance of java.lang.Thread(name='main', id=1) \
> > java.io.OutputStreamWriter:211 in thread instance of \
> > java.lang.Thread(name='main', id=1) java.io.BufferedWriter:120 in thread instance \
> >                 of java.lang.Thread(name='main', id=1)
> > *java.io.PrintStream:653 in thread instance of 
> > java.lang.Thread(name='main', id=1)*
> > java.io.PrintStream:958 in thread instance of java.lang.Thread(name='main', id=1)
> > nsk.jdi.ThreadReference.isSuspended.issuspended002a:49 in thread instance of \
> > java.lang.Thread(name='main', id=1) \
> > nsk.jdi.ThreadReference.isSuspended.issuspended002a:124 in thread instance of \
> > java.lang.Thread(name='main', id=1) 
> > 
> > *java.io.PrintStream:650 in thread instance of 
> > nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
> > id=589)*
> > java.io.PrintStream:958 in thread instance of \
> > nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', \
> > id=589) nsk.jdi.ThreadReference.isSuspended.issuspended002a:53 in thread instance \
> > of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', \
> > id=589) nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a:203 in thread \
> > instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', \
> > id=589) 
> > On 2/22/19 11:21 PM, serguei.spitsyn@oracle.com wrote:
> > > Hi Gary,
> > > 
> > > Thank you for cosmetic changes.
> > > 
> > > One thing to notice there is no guard against spurious wakeup for this:
> > > 108 Threadissuspended002a.waitnotifyObj.wait();
> > > 
> > > But I doubt it causes these timeouts as they are very well reproducible.
> > > 
> > > This time I looked more to the debugger side.
> > > The code is poorly structured, it is hard to understand what it is doing.
> > > 
> > > > - resumed main thread after breakpoint enabled and thread2
> > > double resume
> > > 
> > > Could you, please, explain more how does this help and why timeouts 
> > > are intermittent?
> > > 
> > > Thanks,
> > > Serguei
> > > 
> > > 
> > > On 2/22/19 11:52 AM, Gary Adams wrote:
> > > > After 1000 successful testruns, it seems like a reasonable solution is
> > > > to resume the main thread to resolve any locks holding back the
> > > > thread2 reaching the desired breakpoint.
> > > > 
> > > > Updated webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.01/
> > > > 
> > > > Changes in this iteration:
> > > > - cosmetic changes
> > > > - spaces around operators
> > > > - removed extra spaces around parens
> > > > - removed extra blank lines
> > > > - fixed typo in ERRROR
> > > > - fixed curl braces indenting
> > > > - resumed main thread after breakpoint enabled and thread2
> > > > double resume
> > > > 
> > > > 
> > > > On 2/22/19, 6:19 AM, gary.adams@oracle.com wrote:
> > > > > I'll make a pass over the code for cosmetic fixes after the test
> > > > > is working correctly.
> > > > > 
> > > > > I think the purpose of the lockingObject is to ensure the debugger
> > > > > has time to enable the breakpoint and perform the double resume
> > > > > before debuggee thread2 is released to run to the expected breakpoint.
> > > > > 
> > > > > I'm assuming the timeout and break point location are correct, because
> > > > > the test passes most of the time. I'll dump the values to make sure.
> > > > > 
> > > > > I'll also try to capture a thread dump at the time of the timeout,
> > > > > to see if the debuggee is hung somewhere unexpected.
> > > > > 
> > > > > The test leaves the main thread suspended while thread2 is
> > > > > is resumed. I'll see if resuming the main thread unblocks thread2.
> > > > > 
> > > > > ...
> > > > > 
> > > > > On 2/21/19 10:18 PM, serguei.spitsyn@oracle.com wrote:
> > > > > > Hi Gary,
> > > > > > 
> > > > > > I do not see any cause of this timeout yet.
> > > > > > However, the synchronization on the lockingObject looks completely 
> > > > > > wasteful.
> > > > > > Moreover, it adds unreasonable complexity and makes the test 
> > > > > > harder to understand.
> > > > > > 
> > > > > > So, I'd suggested to do a small cleanup that includes:
> > > > > > - removing the lockingObject field and sync around it
> > > > > > - fix wrong indents in the fragment:
> > > > > > 161         synchronized (waitnotifyObj)                    {
> > > > > > 162             log("entered into block:  synchronized (waitnotifyObj)");
> > > > > > 163             waitnotifyObj.notify();                     }
> > > > > > 
> > > > > > Thanks,
> > > > > > Serguei
> > > > > > 
> > > > > > 
> > > > > > On 2/21/19 10:57, gary.adams@oracle.com wrote:
> > > > > > > The commented out "docontinue" was done to match the style
> > > > > > > used in the other tests. I'll remove it for this test.
> > > > > > > 
> > > > > > > In any event, the test still is failing 8 / 500 testruns on
> > > > > > > macosx and windows debug builds.
> > > > > > > 
> > > > > > > The change is not sufficient.
> > > > > > > 
> > > > > > > Here's a recent failure. You can see in the log that the debugger did
> > > > > > > wait until the new "docontinue" arrives outside the lockingObject
> > > > > > > synchronized block.
> > > > > > > 
> > > > > > > It appears with the main thread suspended and the thread2 double 
> > > > > > > resumed,
> > > > > > > it still does not continue executing up to the breakpoint in thread2.
> > > > > > > After the eventQueue.remove(timeout), when the vm is resumed
> > > > > > > you can see the debuggee messages expected leading up to the 
> > > > > > > breakpoint.
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > --> debugger:       suspending VM with vm.suspend();
> > > > > > > *debugee.stderr> **> mainThread: mainThread is out of: 
> > > > > > > synchronized (lockingObject) {*
> > > > > > > --> debugger:          thread2.suspendCount() == 2
> > > > > > > --> debugger:      :   thread2.isSuspended()
> > > > > > > --> debugger:          mainThread.suspendCount() == 1
> > > > > > > --> debugger:      :   mainThread.isSuspended()
> > > > > > > --> debugger:       double resuming thread2 with thread2.resume();
> > > > > > > --> debugger:          thread2.suspendCount() == 1
> > > > > > > --> debugger:      :   thread2.isSuspended()
> > > > > > > --> debugger:      enabling breakpRequest1
> > > > > > > --> debugger:          thread2.suspendCount() == 0
> > > > > > > --> debugger:      :  !thread2.isAtBreakpoint() before call to \
> > > > > > >                 breakpoint()
> > > > > > > --> debugger:        waiting for BreakpointEvent
> > > > > > > *--> debugger: new: eventSet = eventQueue.remove();*
> > > > > > > # ERROR: ##> debugger: ERROR:  timeout for waiting for a BreakpintEvent
> > > > > > > The following stacktrace is for failure analysis.
> > > > > > > nsk.share.TestFailure: ##> debugger: ERROR:  timeout for waiting for a \
> > > > > > > BreakpintEvent  at \
> > > > > > > nsk.share.Log.logExceptionForFailureAnalysis(Log.java:428)  at \
> > > > > > > nsk.share.Log.complain(Log.java:399)  at \
> > > > > > > nsk.jdi.ThreadReference.isSuspended.issuspended002.log3(issuspended002.java:104)
> > > > > > >   at nsk.jdi.ThreadReference.isSuspended.issuspended002.breakpoint(issuspended002.java:555)
> > > > > > >   at nsk.jdi.ThreadReference.isSuspended.issuspended002.runThis(issuspended002.java:389)
> > > > > > >   at nsk.jdi.ThreadReference.isSuspended.issuspended002.run(issuspended002.java:90)
> > > > > > >   at nsk.jdi.ThreadReference.isSuspended.issuspended002.main(issuspended002.java:85)
> > > > > > >   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native \
> > > > > > > Method)  at \
> > > > > > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > > > > >   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > > >   at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> > > > > > > 	at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
> > > > > > > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native \
> > > > > > > Method)  at \
> > > > > > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > > > > >   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > > >   at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> > > > > > > 	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
> > > > > > >   at java.base/java.lang.Thread.run(Thread.java:835)
> > > > > > > *--> debugger: resuming both second and main thread 
> > > > > > > debugee.stderr> **> mainThread: waiting for an instruction from 
> > > > > > > the debugger ... --> debugger: the end of testing debugee.stderr> 
> > > > > > > **> thread2: entered into block: synchronized (lockingObject) 
> > > > > > > debugee.stderr> **> thread2: exited from block: synchronized 
> > > > > > > (lockingObject) debugee.stderr> **> thread2: call to the method 
> > > > > > > 'runt1' debugee.stderr> **> thread2: method 'runt1' enter*
> > > > > > > --> debugger:      : returned string is 'checkend'
> > > > > > > debugee.stderr> **> mainThread: waiting for an instruction from the \
> > > > > > > debugger ... 
> > > > > > > 
> > > > > > > On 2/20/19 5:15 PM, Alex Menkov wrote:
> > > > > > > > +1
> > > > > > > > 
> > > > > > > > But could you please remove old "pipe.println("docontinue")" 
> > > > > > > > statement instead of commenting it out (no new webrev is required)
> > > > > > > > 
> > > > > > > > --alex
> > > > > > > > 
> > > > > > > > On 02/20/2019 09:58, Chris Plummer wrote:
> > > > > > > > > Looks good.
> > > > > > > > > 
> > > > > > > > > Chris
> > > > > > > > > 
> > > > > > > > > On 2/20/19 7:57 AM, Gary Adams wrote:
> > > > > > > > > > The issuspended002 has been on the ProblemList since the tests 
> > > > > > > > > > were moved to
> > > > > > > > > > the open repos. The proposed changeset applies the same fix 
> > > > > > > > > > that was used in issuspended001.
> > > > > > > > > > 
> > > > > > > > > > The current test fails when the debuggee replies with the 
> > > > > > > > > > "docontinue" response,
> > > > > > > > > > while it is still holding a lock in a syncrhonized block. When 
> > > > > > > > > > the debugger then
> > > > > > > > > > suspends the vm the debuggee test thread fails to proceed to 
> > > > > > > > > > the expected breakpoint.
> > > > > > > > > > Testing in progress.
> > > > > > > > > > 
> > > > > > > > > > Issue: https://bugs.openjdk.java.net/browse/JDK-4903717
> > > > > > > > > > Webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.00/
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic