[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR(s) 8153711: [REDO] JDWP: Memory Leak: GlobalRefs never deleted when processing invokeMethod 
From:       Severin Gehwolf <sgehwolf () redhat ! com>
Date:       2016-05-09 7:21:34
Message-ID: 1462778494.8107.1.camel () redhat ! com
[Download RAW message or body]

Serguei,

On Sun, 2016-05-08 at 17:25 -0700, serguei.spitsyn@oracle.com wrote:
> On 5/8/16 03:58, serguei.spitsyn@oracle.com wrote:
> > Severin,
> > 
> > The JPRT job failed: 2016-05-08-100525.sspitsyn.8153711 with a \
> > NullPointerException. The log link is (you can also find the most relevant \
> > fragment of the log in the bug report): \
> > http://scaaa637.us.oracle.com//archive/2016/05/2016-05-08-100525.sspitsyn.8153711//logs/solaris_x64_5.11-fastdebug-c2-jdk_svc_sanity.log
> > 
> 
> Sorry for providing the link that is not accessible for you.
> But, please, find a fragment of the failure log in my bug report comment.  

Thanks for your help. Time for me to find a Solaris box and try to
reproduce.

Cheers,
Severin

> 
> Thanks,
> Serguei
> 
> > 
> > Please, find the email notification in the attachment.
> > 
> > Thanks,
> > Serguei
> > 
> > 
> > On 5/8/16 00:59, serguei.spitsyn@oracle.com wrote:
> > > Hi Severin,
> > > 
> > > I filed two new bugs to cover the discovered issues:
> > > 8156498: more places in the invoke.c that need protection with the invokerLock
> > > 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java
> > > 
> > > 
> > > Will try to push your two fixes today or tomorrow.
> > > I know, you've just got a committer status.
> > > But I'm not sure you are comfortable to push the fixes yourself at this time.
> > > 
> > > Thanks,
> > > Serguei
> > > 
> > > 
> > > 
> > > On 5/3/16 03:21, serguei.spitsyn@oracle.com wrote:
> > > > Hi Severin,  
> > > > 
> > > > Please, find my comments below.  
> > > > 
> > > > 
> > > > On 5/2/16 01:44, Severin Gehwolf wrote:  
> > > > > On Fri, 2016-04-29 at 12:33 -0700, serguei.spitsyn@oracle.com wrote:  
> > > > > > On 4/29/16 01:56, Severin Gehwolf wrote:  
> > > > > > > Hi Serguei,  
> > > > > > > 
> > > > > > > On Fri, 2016-04-29 at 01:34 -0700, serguei.spitsyn@oracle.com wrote:  
> > > > > > > > Hi Severin,  
> > > > > > > > 
> > > > > > > > The fix looks good in general.  
> > > > > > > > I'm testing both fixes together at the moment.  
> > > > > > > That is JDK-8154529 and JDK-8153711? Yes, that's what I've done too.  
> > > > > > > > A couple of questions...  
> > > > > > > > 
> > > > > > > > It seems, there are more places where an invokerLock critical section \
> > > > > > > > is missed.  
> > > > > > > Right.  
> > > > > > > > The following functions:  
> > > > > > > > - invoker_enableInvokeRequests  
> > > > > > > This should be fixed with the patch for JDK-8154529  
> > > > > > > > - invokeConstructor  
> > > > > > > > - invokeStatic  
> > > > > > > > - invokeNonvirtual  
> > > > > > > > - invokeVirtual  
> > > > > > > > - saveGlobalRef  
> > > > > > > Correct. The intent would be to fix the callsites of saveGlobalRef. If  \
> > > > > > >  we fix invoke* then saveGlobalRef should not be an issue. I didn't \
> > > > > > > want   to include this in this fix since it's pretty hairy and would \
> > > > > > > like to   fix this in incremental steps.  
> > > > > > > 
> > > > > > > > The first function is easy to fix.  
> > > > > > > > The last 5 functions are called from the invoker_doInvoke() that we \
> > > > > > > > already had a problem with.   I'm puzzled with the question: How to \
> > > > > > > > synchronize and avoid deadlocks at the same time?  
> > > > > > > I'm not sure yet. Perhaps locking only while saveGlobalRef is being  
> > > > > > > called in invoke* functions will help.  
> > > > > > > 
> > > > > > > > I'm inclined to let your fix go (if the testing is Ok) and file one \
> > > > > > > > more bug on the remaining sync issues.  
> > > > > > > Please keep me in the loop about your test results.  
> > > > > > Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are all \
> > > > > > passed.   I also ran the 4 previously failed tests in big loops of 1000 \
> > > > > > runs:   com/sun/jdi/InvokeTest.java  
> > > > > > com/sun/jdi/InvokeHangTest.java  
> > > > > > com/sun/jdi/InterfaceMethodsTest.java  
> > > > > > com/sun/jdi/OomDebugTest.java   (new test introduced in the webrev)  
> > > > > I suggest to run InvokeTest, InvokeHangTest and InterfaceMethodsTest in  
> > > > > a loop. Those never failed for me in such a scenario.  
> > > > > 
> > > > > > The OomDebugTest.java failed with a timeout (most likely, a deadlock).  
> > > > > > Please, find the OomDebugTest.jtr file in attachments.  
> > > > > Correct. This is what I was seeing. See the last comment of the bug:  
> > > > > https://bugs.openjdk.java.net/browse/JDK-8153711  
> > > > 
> > > > Need to check if it is the same as I'm seeing.  
> > > > 
> > > > > 
> > > > > It has the jstack output of the hung OomDebugTestTarg JVM. I'm not  
> > > > > convinced this is the same failure we were seeing in JDK-4858370 since  
> > > > > the stack suggests it's doing a GC upon a newInstance of a primitive  
> > > > > array. It also does not seem like the same issue as the deadlocks  
> > > > > exposed by locking during an invoke, because it was reproducible fairly  
> > > > > consistently with InvokeHangTest.  
> > > > 
> > > > Agreed.  
> > > > 
> > > > 
> > > > > 
> > > > > It looks to me like a new issue. Probably one which was there before,  
> > > > > but is only exposed by the new test. The new test stress-tests the GC  
> > > > > with a debugger attached. Of course, this is going to be hard to prove  
> > > > > since it would just run out of memory prior the patch.  
> > > > 
> > > > Thanks.  
> > > > Your analysis seems to be reasonable.  
> > > > I tend to agree with it but need more time to convince myself.  
> > > > 
> > > > 
> > > > > Thoughts?  
> > > > 
> > > > Let me double check the above.  
> > > > If the analysis is correct then I'd suggest to file new bug and push your fix \
> > > > as is.   
> > > > 
> > > > Thanks,  
> > > > Serguei  
> > > > 
> > > > 
> > > > > 
> > > > > Cheers,  
> > > > > Severin  
> > > > > 
> > > > > 
> > > > > 
> > > > > > Thanks,  
> > > > > > Serguei  
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Thanks for your help!  
> > > > > > > 
> > > > > > > Cheers,  
> > > > > > > Severin  
> > > > > > > 
> > > > > > > > Thanks,  
> > > > > > > > Serguei  
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 4/28/16 02:00, Severin Gehwolf wrote:  
> > > > > > > > > On Tue, 2016-04-19 at 19:32 -0700, serguei.spitsyn@oracle.com \
> > > > > > > > > wrote:  
> > > > > > > > > > > Hi Severin,  
> > > > > > > > > > > 
> > > > > > > > > > > I postpone a push for this fix.  
> > > > > > > > > > > 
> > > > > > > > > > > There are two nsk.jdi test failures (they look like deadlocks): \
> > > > > > > > > > >  nsk/jdi/ObjectReference/invokeMethod/invokemethod012 \
> > > > > > > > > > > FAIL(TIMEOUT)   nsk/jdi/Scenarios/invokeMethod/popframes001 \
> > > > > > > > > > > FAIL(TIMEOUT)   
> > > > > > > > > > > and one jtreg sun/com/jdi failure (it looks like a deadlock \
> > > > > > > > > > > too):   com/sun/jdi/InvokeHangTes.java  
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I'll double check if these failures are regressions caused by \
> > > > > > > > > > > your fix   or not.  
> > > > > > > > > > I confirm, the failures above are new regressions introduced by \
> > > > > > > > > > the fix.   The tests fail consistently with the fix and do not \
> > > > > > > > > > fail without the fix.  
> > > > > > > > > OK this was caused by the locking done in invoker_doInvoke(). Note \
> > > > > > > > > that   holding either of them invoker lock or event handler lock \
> > > > > > > > > causes this.   
> > > > > > > > > Here is a new webrev with those hunks removed. It's sufficient to \
> > > > > > > > > grab   the locks again in invoke_completeInvokeRequest() when \
> > > > > > > > > clearing the   global references in order to not get those failures \
> > > > > > > > > we've seen when   the fix for JDK-4858370 was in place.  
> > > > > > > > > 
> > > > > > > > > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/ \
> > > > > > > > >  
> > > > > > > > > Testing done:  
> > > > > > > > > - com/sun/jdi/InvokeTest.java com/sun/jdi/InvokeHangTest.java and  
> > > > > > > > > sun/jdi/InterfaceMethodsTest.java does not fail in 1500 runs.  
> > > > > > > > > - regular com/sun/jdi test set: no regressions  
> > > > > > > > > 
> > > > > > > > > Note that there are some rare cases where OomDebugTest times out \
> > > > > > > > > which   seems to be caused by the GC, though. See the bug for \
> > > > > > > > > details. Since   this problem is rare, it's still worthwhile having \
> > > > > > > > > that test included.   If it turns out a problem in practice we \
> > > > > > > > > could consider disabling the   test.  
> > > > > > > > > 
> > > > > > > > > Thoughts?  
> > > > > > > > > 
> > > > > > > > > Cheers,  
> > > > > > > > > Severin  
> > > > > > > > 
> > > > > > 
> > > > 
> > > 
> > 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic