[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR(S): JDK-8231635: SA Stackwalking code stuck in BasicTypeDataBase.findDynamicTypeForAddress()
From:       Chris Plummer <chris.plummer () oracle ! com>
Date:       2019-11-13 18:55:30
Message-ID: 7e1e5139-0700-3e8f-e410-e111d05767d0 () oracle ! com
[Download RAW message or body]

Thanks Daniil!

Chris

On 11/12/19 9:08 PM, Daniil Titov wrote:
> Hi Chris,
> 
> The change looks good to me.
> 
> Thanks!
> --Daniil
> 
> On 11/12/19, 11:06 AM, "serviceability-dev on behalf of Chris Plummer" \
> <serviceability-dev-bounces@openjdk.java.net on behalf of chris.plummer@oracle.com> \
> wrote: 
> Thanks Serguei!
> 
> Can I get one more review please?
> 
> thanks,
> 
> Chris
> 
> On 11/8/19 4:00 PM, serguei.spitsyn@oracle.com wrote:
> > Hi Chris,
> > 
> > This seems to be a good fix to have in any case.
> > This check and bail out is right thing to do and should not break
> > anything.
> > I understand, this also fixes the test failures.
> > 
> > I only had some experience a long time ago with the support of pstack
> > and DTrace jstack action implementation which also does such SP
> > recovering because the ebp can be used by JIT compiler as a general
> > purpose register. There is no such a problem on sparc.
> > 
> > Thanks,
> > Serguei
> > 
> > 
> > On 11/7/19 14:01, Chris Plummer wrote:
> > > Hi,
> > > 
> > > Please review the following fix for JDK-8231635:
> > > 
> > > https://bugs.openjdk.java.net/browse/JDK-8231635
> > > http://cr.openjdk.java.net/~cjplummer/8231635/webrev.00/
> > > 
> > > I've tried to explain below to the best of my ability what's is going
> > > on, but keep in mind that I basically had no background in this area
> > > before looking into this CR, so this is all new to me. Please feel
> > > free to chime in with corrections to my explanation, or any
> > > additional insight that might help to further understanding of this
> > > code.
> > > 
> > > When doing a thread stack dump, SA has to figure out the SP for the
> > > current frame when it may not in fact be stored anywhere. So it goes
> > > through a series of guesses, starting with the current value of SP.
> > > See AMD64CurrentFrameGuess.run():
> > > 
> > > Address sp  = context.getRegisterAsAddress(AMD64ThreadContext.RSP);
> > > 
> > > There are a number of checks done to see if this is the SP for the
> > > actual current frame, one of the checks being (and kind of a last
> > > resort) to follow the frame links and see if they eventually lead to
> > > the first entry frame:
> > > 
> > > while (frame != null) {
> > > if (frame.isEntryFrame() && frame.entryFrameIsFirst()) {
> > > ...
> > > return true;
> > > }
> > > frame = frame.sender(map);
> > > }
> > > 
> > > If this fails, there is an outer loop to try the next address:
> > > 
> > > for (long offset = 0;
> > > offset < regionInBytesToSearch;
> > > offset += vm.getAddressSize()) {
> > > 
> > > Note that offset is added to the initial SP value that was fetched
> > > from RSP. This approach is fraught with danger, because SP could be
> > > incorrect, and you can easily follow a bad frame link to an invalid
> > > address. So the body of this loop is in a try block that catches all
> > > Exceptions, and simply retries with the next offset if one is caught.
> > > Exceptions could be ones like UnalignedAddressException or
> > > UnmappedAddressException.
> > > 
> > > The bug in question turns up with the following harmless looking line:
> > > 
> > > frame = frame.sender(map);
> > > 
> > > This is fine if you know that "frame" is valid, but what if it is not
> > > (which is very commonly the case). The frame values (SP, FP, and PC)
> > > in the returned frame could be just about anything, including being
> > > the same as the previous frame. This is what will happen if the SP
> > > stored in "frame" is the same as the SP that was used to initialize
> > > "frame" in the first place. This can certainly happen when SP is not
> > > valid to start with, and is indeed what caused this bug. The end
> > > result is the inner while loop gets stuck in an infinite loop
> > > traversing the same frame. So the fix is to add a check for this to
> > > make sure to break out of the while loop if this happens. Initially I
> > > did this with an Address.equal() call, and that seemed to fix the
> > > problem, but then I realized it would be possible to traverse through
> > > one or more sender frames and eventually end up returning to a
> > > previously visited frame, thus still an infinite loop. So I decided
> > > on checking for Address.lessThanOrEqual() instead since the send
> > > frame's SP should always be greater than the current frame's
> > > (referred to as oldFrame) SP. As long as we always move in one
> > > direction (towards a higher frame address), you can't have an
> > > infinite loop in this code.
> > > 
> > > I applied this fix to x86. Although not tested, it is built (all
> > > platform support is always built with SA). The x86 and amd64 versions
> > > are identical except for x86/amd64 references, so I thought it best
> > > to go ahead and do the update to x86. I did not touch ppc, but would
> > > be willing to update if someone passes along a fix that is tested.
> > > 
> > > One final bit of clarification. The bug synopsis mentions getting
> > > stuck in BasicTypeDataBase.findDynamicTypeForAddress(). This turns
> > > out to not actually be the case, but every stack trace I initially
> > > looked when I filed this CR was showing the thread being in this
> > > frame and at the same line number. This appears to be the next
> > > available safepoint where the thread can be suspended for stack
> > > dumping. When debugging this some more and adding a lot of println()
> > > calls in a lot of different locations, I started to see different
> > > frames in the stacktrace, presumably because the println() calls
> > > where adding additional safepoints.
> > > 
> > > thanks,
> > > 
> > > Chris
> > > 
> > 
> 
> 
> 
> 
> 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic