[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: Fwd: RFR(S) 8231634: SA stack walking fails with "illegal bci"
From:       Chris Plummer <chris.plummer () oracle ! com>
Date:       2020-04-28 19:14:10
Message-ID: 099a20e0-6d63-bc04-c7db-faeb2b8f2067 () oracle ! com
[Download RAW message or body]

Thanks Alex and Serguei!

Chris

On 4/27/20 6:23 PM, Alex Menkov wrote:
> Hi Chris,
>
> great analysis.
> LGTM++
>
> --alex
>
> On 04/27/2020 14:52, serguei.spitsyn@oracle.com wrote:
>> Hi Chris,
>>
>> The fix looks reasonable.
>>
>> Thanks,
>> Serguei
>>
>>
>> On 4/27/20 12:16, Chris Plummer wrote:
>>> Ping! Please help review if you can. There's also an RFR out for 
>>> 8243500, which is related.
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>> -------- Forwarded Message --------
>>> Subject:         RFR(S) 8231634: SA stack walking fails with "illegal bci"
>>> Date:         Thu, 23 Apr 2020 14:35:49 -0700
>>> From:         Chris Plummer <chris.plummer@oracle.com>
>>> To:         serviceability-dev <serviceability-dev@openjdk.java.net>
>>>
>>>
>>>
>>> Hello,
>>>
>>> Please help review the following:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8231634
>>> http://cr.openjdk.java.net/~cjplummer/8231634/webrev.00/index.html
>>>
>>> The fix is fairly simple: Don't use R13 as the BCP if it does not 
>>> point into the methods actual bytecodes. Instead rely on the BCP 
>>> value flushed to the frame. Details below are mostly there to help 
>>> you understand the big picture of what is going on.
>>>
>>> First a bit a background on what the live_bcp field is in the 
>>> X86Frame. BCP is "bytecode pointer". It points to the current 
>>> bytecode being executed. I'm mostly just talking about interpreter 
>>> frames here, but I think for compiled frames it points to the 
>>> current native instruction being executed. For all but the current 
>>> frame, frame->bcp reliably contains the BCP. For the current frame 
>>> you normally need to look in a register for an up to date BCP. On 
>>> x86_64 that's R13. When trying to find and create the current frame 
>>> for a thread, eventually you end up in 
>>> LinuxAMD64JavaThreadPDAccess.getCurrentFrameGuess(), which does:
>>>
>>>            // pass the value of R13 which contains the bcp for the top 
>>> level frame
>>>            Address bcp = 
>>> context.getRegisterAsAddress(AMD64ThreadContext.R13);
>>>            return new X86Frame(guesser.getSP(), guesser.getFP(), 
>>> guesser.getPC(), null, bcp <-- live_bcp field);
>>>
>>> So in this case the X86Frame is constructed with R13 stored in the 
>>> live_bcp field. For any frame other than the current frame, the 
>>> live_bcp field will be NULL.
>>>
>>> And one other bit of clarification before moving on to the bug. 
>>> You'll see BCX in code. For example, INTERPRETER_FRAME_BCX_OFFSET. 
>>> BCX == BCP. I'm not sure why BCX was ever chosen. I was tempted to 
>>> rename it to BCP, but it's been cloned to all the other SA CPU 
>>> ports, so I think it's best to just leave it.
>>>
>>> Now on to the bug. Sometimes the interpreter uses R13 as a scratch 
>>> register, so it does not always contain the BCP. This is why 
>>> sometimes tests will see an "illegal bci" assert from SA. It's 
>>> because r13 is currently not pointing to the BCP of the current 
>>> frame, and thus can't be converted to a BCI.
>>>
>>> What this fix does take advantage of the fact that when r13 is used 
>>> as a scratch register, it is first flushed to frame->bcp. So with 
>>> this fix R13 is only used as the BCP if it points within the 
>>> Method's bytecodes. If it does not, then SA will use the BCP flushed 
>>> to frame->bcp. The reason we don't always used frame->bcp is because 
>>> it is not kept up to date as the interpreter moves between 
>>> bytecodes, but we know it is up to date if R13 is currently being 
>>> used as a scratch register. So in other words, an invalid R13 
>>> implies that frame->bcp is up to date.
>>>
>>> And if you find yourself thinking "X86Frame.java supports is for 
>>> both 32-bit and 64-bit x86, but R13 does not exists on 32-bit.", 
>>> your right. A couple of years ago the concept of live_bcp was 
>>> introduced to fix JDK-8214226 [1]. It only added support for fixing 
>>> 64-bit and ignored 32-bit. So you'll never see live_bcp getting set 
>>> for 32-bit, and thus JDK-8214226 is still broken on 32-bit, but it 
>>> also means 32-bit is not impacted by the bug I'm fixing here. So the 
>>> fix for JDK-8214226 is why you see R13 references in the 
>>> X86Frame.java, but the code will still work for any platform that 
>>> sets up live_bcd properly, even if it's not actually R13.
>>>
>>> ...and one more thing. 8214226 [1] actually introduced this "illegal 
>>> bci" bug. However 8214226 [1] was only ever applied to LinuxAMD64. 
>>> Never for BSD or Windows. This means two things: (1) the "illegal 
>>> bci" bug never turned up on these platforms, and (2) 8214226 is 
>>> still broken on these platform, meaning the bci returned by 
>>> getInterpreterFrameBCI() is likely often stale, meaning stack traces 
>>> will not show the proper line number for the current frame. It seems 
>>> we don't have a test to catch this. I've filed 8243500 [2] to cover 
>>> fixing 8214226 on BSD and Windows.
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8214226
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8243500
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic