'Re: RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load
From:       Erik_Österlund <erik.osterlund () oracle ! com>
Date:       2019-11-25 14:37:45
Message-ID: f2ea5361-9d66-d8c3-c82d-816001855939 () oracle ! com
[Download RAW message or body]

Hi Coleen,

Still good BTW!

Thanks,
/Erik

On 2019-11-25 14:47, coleen.phillimore@oracle.com wrote:
> Thanks for the code review, Serguei!
> Coleen
>
> On 11/22/19 6:34 PM, serguei.spitsyn@oracle.com wrote:
>> Hi Coleen,
>>
>> +1
>>
>> Thanks,
>> Serguei
>>
>>
>> On 11/22/19 14:53, Daniel D. Daugherty wrote:
>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
>>>
>>> src/hotspot/share/prims/jvmtiImpl.cpp
>>>     No comments.
>>>
>>> src/hotspot/share/prims/jvmtiImpl.hpp
>>>     No comments.
>>>
>>> src/hotspot/share/runtime/serviceThread.cpp
>>>     No comments.
>>>
>>> Thumbs up.
>>>
>>> Dan
>>>
>>>
>>> On 11/22/19 2:15 PM, coleen.phillimore@oracle.com wrote:
>>>>
>>>> Dan, Thank you for reviewing this!
>>>>
>>>> On 11/22/19 12:49 PM, Daniel D. Daugherty wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> Sorry for the delay in getting back to this re-review.
>>>>>
>>>>>
>>>>> On 11/21/19 9:12 AM, coleen.phillimore@oracle.com wrote:
>>>>>>
>>>>>> Please review a new version of this change that keeps the nmethod 
>>>>>> from being unloaded, after it is added to the deferred event queue:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html
>>>>>
>>>>> src/hotspot/share/code/nmethod.cpp
>>>>>     No comments.
>>>>>
>>>>> src/hotspot/share/oops/instanceKlass.cpp
>>>>>     No comments.
>>>>>
>>>>> src/hotspot/share/prims/jvmtiExport.cpp
>>>>>     No comments.
>>>>>
>>>>> src/hotspot/share/prims/jvmtiImpl.cpp
>>>>>     Nice solution with the new oops_do() and nmethods_do() functions!
>>>> Erik's insistance!
>>>>>
>>>>>     old L988: void JvmtiDeferredEventQueue::enqueue(const 
>>>>> JvmtiDeferredEvent& event) {
>>>>>     new L998: void 
>>>>> JvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent event) {
>>>>>         Not sure why this was changed.
>>>>>
>>>>>         Update: Looks like Serguei raised the issue and Coleen has 
>>>>> already
>>>>>         resolved it.
>>>>
>>>> Yes.
>>>>>
>>>>> src/hotspot/share/prims/jvmtiImpl.hpp
>>>>>     old L494:     QueueNode(const JvmtiDeferredEvent& event)
>>>>>     new L498:     QueueNode(JvmtiDeferredEvent& event)
>>>>>         Why was this changed?
>>>>>
>>>>>         Update: Not clear if this was covered by Coleen's reply to 
>>>>> Serguei.
>>>>>
>>>>>     old L497:     const JvmtiDeferredEvent& event() const { return 
>>>>> _event; }
>>>>>     new L501:     JvmtiDeferredEvent& event() { return _event; }
>>>>>         Why was this changed?
>>>>>
>>>>>         Update: Coleen's reply to Serguei explained this. Perhaps add:
>>>>>                   // Not const because of oops_do() and nmethods_do().
>>>>>
>>>>>     old L509:   static void enqueue(const JvmtiDeferredEvent& 
>>>>> event) NOT_JVMTI_RETURN;
>>>>>     new L513:   static void enqueue(JvmtiDeferredEvent event) 
>>>>> NOT_JVMTI_RETURN;
>>>>>         Why was this changed?
>>>>>
>>>>>         Update: Looks like Serguei raised the issue and Coleen has 
>>>>> already
>>>>>         resolved it.
>>>>
>>>> Yes, I fixed these.
>>>>>
>>>>> src/hotspot/share/runtime/mutexLocker.cpp
>>>>>     This change is going to require some testing to make sure we don't
>>>>>     have any new deadlock scenarios.
>>>>
>>>> Luckily, I've previously added an implicit NoSafepointVerifier to 
>>>> locks that are _allow_vm_block = true, like this one.
>>>> + def(JmethodIdCreation_lock , PaddedMutex , leaf, true, 
>>>> _safepoint_check_never); // used for creating jmethodIDs.
>>>> which prevents one class of deadlock. If we take out another lock 
>>>> with a higher rank, we'll get the ranking assert.
>>>>
>>>> This lock prevents insertion into an array, and has little outside 
>>>> calls.
>>>>
>>>> I'm running tests in tier 1-6 but any code that travels through 
>>>> this should get these assertion checks, rather than deadlocking.
>>>>
>>>>>
>>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>>     L50 - nit - why the extra blank line?
>>>>
>>>> To separate static data member definitions from functions. I 
>>>> removed it.
>>>>>
>>>>> src/hotspot/share/runtime/serviceThread.hpp
>>>>>     Thanks for cleaning up the static:
>>>>>
>>>>>       ServiceThread::is_service_thread(Thread* thread)
>>>>>
>>>>>     stuff. Having it be different than the other threads was
>>>>>     a bit jarring.
>>>>>
>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>     No comments.
>>>>>
>>>>> Thumbs up. My only comments are nits so I don't need to see a
>>>>> new webrev if you decide to fix them.
>>>>
>>>> So it turns out that in stress testing my fix 
>>>> forhttps://bugs.openjdk.java.net/browse/JDK-8212160
>>>>
>>>> Because I was in the area and thought this was a duplicate of that 
>>>> bug (it is not).   I found that calling oops_do and nmethods_do the 
>>>> ServiceThread  needs to hold the Service_lock, because other 
>>>> threads can be adding things to the global queue while the sweeper 
>>>> thread is calling this in a handshake.
>>>>
>>>> I am now retesting this change with the changes above, and with the 
>>>> Service_lock.   So far my stress tests for JDK-81212160 and the 
>>>> stress test for this bug pass, but I'm going to run through all the 
>>>> tiers 1-6 over the weekend.
>>>>
>>>> Please have a look at the changes in the meantime.
>>>>
>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> Dan
>>>>>
>>>>>>
>>>>>> Ran the test that failed 100 times without failure, tier1 on 
>>>>>> Oracle supported platforms, and tier2-3 including jvmti and jdi 
>>>>>> tests locally.
>>>>>>
>>>>>> See bug for more details about the crash.
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 11/18/19 10:09 PM, coleen.phillimore@oracle.com wrote:
>>>>>>>
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> Sorry for not sending an update.  I talked to Erik and am 
>>>>>>> working on a version that keeps the nmethod from being unloaded 
>>>>>>> while it's in the deferred event queue, with a version that the 
>>>>>>> GC people will like, and I like.  I'm testing it out now.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Coleen
>>>>>>>
>>>>>>>
>>>>>>> On 11/18/19 10:03 PM, serguei.spitsyn@oracle.com wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> Sorry for the latency, I had to investigate it a little bit.
>>>>>>>> I still have some doubt your fix is right thing to do.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/16/19 04:55, coleen.phillimore@oracle.com wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/15/19 11:17 PM, serguei.spitsyn@oracle.com wrote:
>>>>>>>>>> Hi Coleen,
>>>>>>>>>>
>>>>>>>>>> On 11/15/19 2:12 PM, coleen.phillimore@oracle.com wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi, I've been working on answers to these questions, so I'll 
>>>>>>>>>>> start with this one.
>>>>>>>>>>>
>>>>>>>>>>> The nmethodLocker keeps the nmethod from being reclaimed 
>>>>>>>>>>> (made_zombie or memory released) by the sweeper, but the 
>>>>>>>>>>> nmethod could be unloaded.  Unloading the nmethod clears the 
>>>>>>>>>>> Method* _method field.
>>>>>>>>>>
>>>>>>>>>> Yes, I see it is done in the nmethod::make_unloaded().
>>>>>>>>>>
>>>>>>>>>>> The post_compiled_method_load event needs the _method field 
>>>>>>>>>>> to look at things like inlining and ScopeDesc fields.   If 
>>>>>>>>>>> the nmethod is unloaded, some of the oops are dead.  There 
>>>>>>>>>>> are "holder" oops that correspond to the metadata in the 
>>>>>>>>>>> nmethod.  If these oops are dead, causing the nmethod to get 
>>>>>>>>>>> unloaded, then the metadata may not be valid.
>>>>>>>>>>>
>>>>>>>>>>> So my change 02 looks for a NULL nmethod._method field to 
>>>>>>>>>>> tell whether we can post information about the nmethod.
>>>>>>>>>>>
>>>>>>>>>>> There's code in nmethod.cpp like:
>>>>>>>>>>>
>>>>>>>>>>> jmethodID nmethod::get_and_cache_jmethod_id() {
>>>>>>>>>>>   if (_jmethod_id == NULL) {
>>>>>>>>>>>     // Cache the jmethod_id since it can no longer be looked 
>>>>>>>>>>> up once the
>>>>>>>>>>>     // method itself has been marked for unloading.
>>>>>>>>>>>     _jmethod_id = method()->jmethod_id();
>>>>>>>>>>>   }
>>>>>>>>>>>   return _jmethod_id;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Which was added when post_method_load and unload were turned 
>>>>>>>>>>> into deferred events.
>>>>>>>>>>
>>>>>>>>>> Could we cache the jmethodID in the 
>>>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event
>>>>>>>>>> similarly as we do in the 
>>>>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event?
>>>>>>>>>> This would help to get rid of the dependency on the 
>>>>>>>>>> nmethod::_method.
>>>>>>>>>> Do we depend on any other nmethod fields?
>>>>>>>>>
>>>>>>>>> Yes, there are other nmethod metadata that we rely on to print 
>>>>>>>>> inline information, and this function 
>>>>>>>>> JvmtiCodeBlobEvents::build_jvmti_addr_location_map because it 
>>>>>>>>> uses the ScopeDesc data in the nmethod.
>>>>>>>>
>>>>>>>> One possible approach is to prepare and cache all this information
>>>>>>>> in the nmethod::post_compiled_method_load_event() before the
>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event() is called.
>>>>>>>> The event parameters are:
>>>>>>>> typedef struct {
>>>>>>>>      const void* start_address;
>>>>>>>>      jlocation location;
>>>>>>>> } jvmtiAddrLocationMap;
>>>>>>>> CompiledMethodLoad(jvmtiEnv *jvmti_env,
>>>>>>>>              jmethodID method,
>>>>>>>>              jint code_size,
>>>>>>>>              const void* code_addr,
>>>>>>>>              jint map_length,
>>>>>>>>              const jvmtiAddrLocationMap* map,
>>>>>>>>              const void* compile_info)
>>>>>>>> Some of these addresses above could be not accessible when an 
>>>>>>>> event is posted.
>>>>>>>> Not sure yet if it is Okay.
>>>>>>>> The question is if this kind of refactoring is worth and right 
>>>>>>>> thing to do.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> We do cache the jmethodID but that's not good enough.  See my 
>>>>>>>>> last comment in the bug report. The jmethodID can point to an 
>>>>>>>>> unloaded method.
>>>>>>>>
>>>>>>>> This looks like it is done a little bit late.
>>>>>>>> It'd better to do it before the event is deferred (see above).
>>>>>>>>
>>>>>>>>> I tried a version of keeping the nmethod alive, but the GC 
>>>>>>>>> folks will hate it.  And it doesn't work and I hate it.
>>>>>>>>
>>>>>>>> From serviceability point of view this is the best and most 
>>>>>>>> consistent approach.
>>>>>>>> I seems to me, it was initially designed this way.
>>>>>>>> The downside is it adds some extra complexity to the GC.
>>>>>>>>
>>>>>>>>> My version 01 is the best, with the caveat that maybe it 
>>>>>>>>> should check for _method == NULL instead of 
>>>>>>>>> nmethod->is_alive().  I have to talk to Erik to see if there's 
>>>>>>>>> a race with concurrent class unloading.
>>>>>>>>>
>>>>>>>>> Any application that depends on a compiled method loading 
>>>>>>>>> event on a class that could be unloaded is a buggy 
>>>>>>>>> application.  Applications should not rely on when the JIT 
>>>>>>>>> compiler decides to compile a method!  This happens to us for 
>>>>>>>>> a stress test. Most applications will get most of their 
>>>>>>>>> compiled method loading events as they normally do.
>>>>>>>>
>>>>>>>> It is not an application that relies on the compiled method 
>>>>>>>> loading event.
>>>>>>>> It is about profiling tools to be able to get correct 
>>>>>>>> information about what is going on with compilations.
>>>>>>>> My concern is that if we skip such compiled method load events 
>>>>>>>> then profilers have no way
>>>>>>>> to find out there many unneeded compilations that are thrown 
>>>>>>>> away without any real use.
>>>>>>>> Also, it is not clear what happens with the subsequent compiled 
>>>>>>>> method unload events.
>>>>>>>> Are they going to be skipped as well or they can appear and 
>>>>>>>> confuse profilers?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>>> I put more debugging in the bug to show this crash was from 
>>>>>>>>>>> an unloaded nmethod.
>>>>>>>>>>>
>>>>>>>>>>> Coleen
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/15/19 4:45 PM, serguei.spitsyn@oracle.com wrote:
>>>>>>>>>>>> Hi Coleen,
>>>>>>>>>>>>
>>>>>>>>>>>> I have some questions.
>>>>>>>>>>>>
>>>>>>>>>>>> Both the compiler method load and unload are posted as 
>>>>>>>>>>>> deferred events.
>>>>>>>>>>>> Both events keep the nmethod alive until the ServiceThread 
>>>>>>>>>>>> processes the event.
>>>>>>>>>>>>
>>>>>>>>>>>> The implementation is:
>>>>>>>>>>>>
>>>>>>>>>>>> JvmtiDeferredEvent 
>>>>>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
>>>>>>>>>>>>   . . .
>>>>>>>>>>>>   // Keep the nmethod alive until the ServiceThread can process
>>>>>>>>>>>>   // this deferred event.
>>>>>>>>>>>>   nmethodLocker::lock_nmethod(nm);
>>>>>>>>>>>>   return event;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> JvmtiDeferredEvent 
>>>>>>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event(nmethod* 
>>>>>>>>>>>> nm, jmethodID id, const void* code) {
>>>>>>>>>>>>   . . .
>>>>>>>>>>>>   // Keep the nmethod alive until the ServiceThread can process
>>>>>>>>>>>>   // this deferred event. This will keep the memory for the
>>>>>>>>>>>>   // generated code from being reused too early. We pass
>>>>>>>>>>>>   // zombie_ok == true here so that our nmethod that was just
>>>>>>>>>>>>   // made into a zombie can be locked.
>>>>>>>>>>>>   nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
>>>>>>>>>>>>   return event;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> void JvmtiDeferredEvent::post() {
>>>>>>>>>>>> assert(ServiceThread::is_service_thread(Thread::current()),
>>>>>>>>>>>>          "Service thread must post enqueued events");
>>>>>>>>>>>>   switch(_type) {
>>>>>>>>>>>>     case TYPE_COMPILED_METHOD_LOAD: {
>>>>>>>>>>>>       nmethod* nm = _event_data.compiled_method_load;
>>>>>>>>>>>> JvmtiExport::post_compiled_method_load(nm);
>>>>>>>>>>>>       // done with the deferred event so unlock the nmethod
>>>>>>>>>>>>       nmethodLocker::unlock_nmethod(nm);
>>>>>>>>>>>>       break;
>>>>>>>>>>>>     }
>>>>>>>>>>>>     case TYPE_COMPILED_METHOD_UNLOAD: {
>>>>>>>>>>>>       nmethod* nm = _event_data.compiled_method_unload.nm;
>>>>>>>>>>>> JvmtiExport::post_compiled_method_unload(
>>>>>>>>>>>> _event_data.compiled_method_unload.method_id,
>>>>>>>>>>>> _event_data.compiled_method_unload.code_begin);
>>>>>>>>>>>>       // done with the deferred event so unlock the nmethod
>>>>>>>>>>>>       nmethodLocker::unlock_nmethod(nm);
>>>>>>>>>>>>       break;
>>>>>>>>>>>>     }
>>>>>>>>>>>>     . . .
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Then I wonder how is it possible for the nmethod to be not 
>>>>>>>>>>>> alive here?:
>>>>>>>>>>>> 2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
>>>>>>>>>>>> . . .
>>>>>>>>>>>> 2173 // It's not safe to look at metadata for unloaded methods.
>>>>>>>>>>>> 2174 if (!nm->is_alive()) {
>>>>>>>>>>>> 2175 return;
>>>>>>>>>>>> 2176 }
>>>>>>>>>>>> At least, it lokks like something else is broken.
>>>>>>>>>>>> Do I miss something important here?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Serguei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/14/19 5:15 PM, coleen.phillimore@oracle.com wrote:
>>>>>>>>>>>>> Summary: Don't post information which uses metadata from 
>>>>>>>>>>>>> unloaded nmethods
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tested tier1-3 and 100 times with test that failed 
>>>>>>>>>>>>> (reproduced failure without the fix).
>>>>>>>>>>>>>
>>>>>>>>>>>>> open webrev at 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
>>>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Coleen
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


[Attachment #3 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <tt>Hi Coleen,<br>
      <br>
      Still good BTW!<br>
      <br>
      Thanks,<br>
      /Erik<br>
    </tt><br>
    <div class="moz-cite-prefix">On 2019-11-25 14:47,
      <a class="moz-txt-link-abbreviated" \
href="mailto:coleen.phillimore@oracle.com">coleen.phillimore@oracle.com</a> \
wrote:<br>  </div>
    <blockquote type="cite"
      cite="mid:200cb839-9019-58f1-17e5-7a0426a6035b@oracle.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      Thanks for the code review, Serguei!<br>
      Coleen<br>
      <br>
      <div class="moz-cite-prefix">On 11/22/19 6:34 PM, <a
          class="moz-txt-link-abbreviated"
          href="mailto:serguei.spitsyn@oracle.com"
          moz-do-not-send="true">serguei.spitsyn@oracle.com</a> wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:568c2562-0a56-73ac-c0af-43339d701b19@oracle.com">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <div class="moz-cite-prefix">Hi Coleen,<br>
          <br>
          +1<br>
          <br>
          Thanks,<br>
          Serguei<br>
          <br>
          <br>
          On 11/22/19 14:53, Daniel D. Daugherty wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:020d7c80-4d77-ee96-5a7b-74acdbd54f86@oracle.com">
          <blockquote type="cite"><a class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev"
              moz-do-not-send="true">http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev</a></blockquote>
  <tt> <br>
            src/hotspot/share/prims/jvmtiImpl.cpp<br>
                No comments.<br>
            <br>
            src/hotspot/share/prims/jvmtiImpl.hpp<br>
                No comments.<br>
            <br>
            src/hotspot/share/runtime/serviceThread.cpp<br>
                No comments.<br>
            <br>
            Thumbs up.<br>
            <br>
            Dan<br>
            <br>
          </tt><br>
          <div class="moz-cite-prefix">On 11/22/19 2:15 PM, <a
              class="moz-txt-link-abbreviated"
              href="mailto:coleen.phillimore@oracle.com"
              moz-do-not-send="true">coleen.phillimore@oracle.com</a>
            wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:24fc9c1c-bfd1-5edf-2231-d9ba0e0885f5@oracle.com">
            <br>
            Dan, Thank you for reviewing this!<br>
            <br>
            <div class="moz-cite-prefix">On 11/22/19 12:49 PM, Daniel D.
              Daugherty wrote:<br>
            </div>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com">
              <tt>Hi Coleen,<br>
                <br>
                Sorry for the delay in getting back to this re-review.<br>
                <br>
              </tt><br>
              <div class="moz-cite-prefix">On 11/21/19 9:12 AM, <a
                  class="moz-txt-link-abbreviated"
                  href="mailto:coleen.phillimore@oracle.com"
                  moz-do-not-send="true">coleen.phillimore@oracle.com</a>
                wrote:<br>
              </div>
              <blockquote type="cite"
                cite="mid:2cb4cdb4-4acf-2abf-af79-445bdfc651b5@oracle.com">
                <br>
                Please review a new version of this change that keeps
                the nmethod from being unloaded, after it is added to
                the deferred event queue:<br>
                <br>
                <a
href="http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html"
                  moz-do-not-send="true">http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html</a><br>
  </blockquote>
              <tt> <br>
                src/hotspot/share/code/nmethod.cpp<br>
                    No comments.<br>
                <br>
                src/hotspot/share/oops/instanceKlass.cpp<br>
                    No comments.<br>
                <br>
                src/hotspot/share/prims/jvmtiExport.cpp<br>
                    No comments.<br>
                <br>
                src/hotspot/share/prims/jvmtiImpl.cpp<br>
                    Nice solution with the new oops_do() and
                nmethods_do() functions!<br>
              </tt></blockquote>
            <tt>Erik's insistance!</tt><br>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com"><tt>
                <br>
                    old L988: void
                JvmtiDeferredEventQueue::enqueue(const
                JvmtiDeferredEvent&amp; event) {<br>
                    new L998: void
                JvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent
                event) {<br>
                        Not sure why this was changed.<br>
                <br>
                        Update: Looks like Serguei raised the issue and
                Coleen has already<br>
                        resolved it.<br>
              </tt></blockquote>
            <br>
            <tt>Yes.</tt><br>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com"><tt>
                <br>
                src/hotspot/share/prims/jvmtiImpl.hpp<br>
                    old L494:     QueueNode(const
                JvmtiDeferredEvent&amp; event)<br>
                    new L498:     QueueNode(JvmtiDeferredEvent&amp;
                event)<br>
                        Why was this changed?<br>
                <br>
                        Update: Not clear if this was covered by
                Coleen's reply to Serguei.<br>
                <br>
                    old L497:     const JvmtiDeferredEvent&amp; event()
                const { return _event; }<br>
                    new L501:     JvmtiDeferredEvent&amp; event() {
                return _event; }<br>
                        Why was this changed?<br>
                <br>
                        Update: Coleen's reply to Serguei explained
                this. Perhaps add:<br>
                                  // Not const because of oops_do() and
                nmethods_do().<br>
                <br>
                    old L509:   static void enqueue(const
                JvmtiDeferredEvent&amp; event) NOT_JVMTI_RETURN;<br>
                    new L513:   static void enqueue(JvmtiDeferredEvent
                event) NOT_JVMTI_RETURN;<br>
                        Why was this changed?<br>
                <br>
                        Update: Looks like Serguei raised the issue and
                Coleen has already<br>
                        resolved it.<br>
              </tt></blockquote>
            <br>
            Yes, I fixed these.<br>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com"><tt>
                <br>
                src/hotspot/share/runtime/mutexLocker.cpp<br>
                    This change is going to require some testing to make
                sure we don't<br>
                    have any new deadlock scenarios.<br>
              </tt></blockquote>
            <br>
            Luckily, I've previously added an implicit
            NoSafepointVerifier to locks that are _allow_vm_block =
            true, like this one.<br>
            <pre><span class="new">+  def(JmethodIdCreation_lock       , PaddedMutex  \
, leaf,        true,  _safepoint_check_never); // used for creating jmethodIDs.

</span></pre>
            <span class="new">which prevents one class of deadlock. If
              we take out another lock with a higher rank, we'll get the
              ranking assert.</span><br>
            <span class="new"></span><br>
            <span class="new">This lock prevents insertion into an
              array, and has little outside calls.</span><br>
            <span class="new"></span><br>
            <span class="new">I'm running tests in tier 1-6 but any code
              that travels through this should get these assertion
              checks, rather than deadlocking.</span><br>
            <span class="new"></span><br>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com"><tt>
                <br>
                src/hotspot/share/runtime/serviceThread.cpp<br>
                    L50 - nit - why the extra blank line?<br>
              </tt></blockquote>
            <br>
            To separate static data member definitions from functions. 
            I removed it.<br>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com"><tt>
                <br>
                src/hotspot/share/runtime/serviceThread.hpp<br>
                    Thanks for cleaning up the static:<br>
                <br>
                      ServiceThread::is_service_thread(Thread* thread)<br>
                <br>
                    stuff. Having it be different than the other threads
                was<br>
                    a bit jarring.<br>
                <br>
                src/hotspot/share/runtime/thread.hpp<br>
                    No comments.<br>
                <br>
                Thumbs up. My only comments are nits so I don't need to
                see a<br>
                new webrev if you decide to fix them.<br>
              </tt></blockquote>
            <br>
            So it turns out that in stress testing my fix for<tt> </tt><a
              href="https://bugs.openjdk.java.net/browse/JDK-8212160"
              moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8212160</a><br>
  <br>
            Because I was in the area and thought this was a duplicate
            of that bug (it is not).   I found that calling oops_do and
            nmethods_do the ServiceThread  needs to hold the
            Service_lock, because other threads can be adding things to
            the global queue while the sweeper thread is calling this in
            a handshake.<br>
            <br>
            I am now retesting this change with the changes above, and
            with the Service_lock.   So far my stress tests for
            JDK-81212160 and the stress test for this bug pass, but I'm
            going to run through all the tiers 1-6 over the weekend.<br>
            <br>
            Please have a look at the changes in the meantime.<br>
            <br>
            <a class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev"
              moz-do-not-send="true">http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev</a><br>
  <a class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev"
              moz-do-not-send="true">http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev</a><br>
  <br>
            Thanks,<br>
            Coleen<br>
            <blockquote type="cite"
              cite="mid:eca6dfa3-4c25-0f2f-781b-0515c4e48e0a@oracle.com"><tt>
                <br>
                Dan<br>
                <br>
              </tt>
              <blockquote type="cite"
                cite="mid:2cb4cdb4-4acf-2abf-af79-445bdfc651b5@oracle.com">
                <br>
                Ran the test that failed 100 times without failure,
                tier1 on Oracle supported platforms, and tier2-3
                including jvmti and jdi tests locally.<br>
                <br>
                See bug for more details about the crash.<br>
                <br>
                <a
                  href="https://bugs.openjdk.java.net/browse/JDK-8173361"
                  moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8173361</a><br>
  <br>
                Thanks,<br>
                Coleen<br>
                <br>
                <div class="moz-cite-prefix">On 11/18/19 10:09 PM, <a
                    class="moz-txt-link-abbreviated"
                    href="mailto:coleen.phillimore@oracle.com"
                    moz-do-not-send="true">coleen.phillimore@oracle.com</a>
                  wrote:<br>
                </div>
                <blockquote type="cite"
                  cite="mid:3f95ae42-346e-3caa-06bd-0facfb939225@oracle.com">
                  <br>
                  Hi Serguei,<br>
                  <br>
                  Sorry for not sending an update.  I talked to Erik and
                  am working on a version that keeps the nmethod from
                  being unloaded while it's in the deferred event queue,
                  with a version that the GC people will like, and I
                  like.  I'm testing it out now.<br>
                  <br>
                  Thanks!<br>
                  Coleen<br>
                  <br>
                  <br>
                  <div class="moz-cite-prefix">On 11/18/19 10:03 PM, <a
                      class="moz-txt-link-abbreviated"
                      href="mailto:serguei.spitsyn@oracle.com"
                      moz-do-not-send="true">serguei.spitsyn@oracle.com</a>
                    wrote:<br>
                  </div>
                  <blockquote type="cite"
                    cite="mid:bad60054-4cdc-097b-3948-548a7db55995@oracle.com">
                    <div class="moz-cite-prefix">Hi Coleen,<br>
                      <br>
                      Sorry for the latency, I had to investigate it a
                      little bit.<br>
                      I still have some doubt your fix is right thing to
                      do.<br>
                      <br>
                      <br>
                      On 11/16/19 04:55, <a
                        class="moz-txt-link-abbreviated"
                        href="mailto:coleen.phillimore@oracle.com"
                        moz-do-not-send="true">coleen.phillimore@oracle.com</a>
                      wrote:<br>
                    </div>
                    <blockquote type="cite"
                      cite="mid:40068f02-812c-b4bc-c150-d12e2fa03f01@oracle.com">
                      <br>
                      <br>
                      <div class="moz-cite-prefix">On 11/15/19 11:17 PM,
                        <a class="moz-txt-link-abbreviated"
                          href="mailto:serguei.spitsyn@oracle.com"
                          moz-do-not-send="true">serguei.spitsyn@oracle.com</a>
                        wrote:<br>
                      </div>
                      <blockquote type="cite"
                        cite="mid:4f9f5421-29a6-d191-b03e-093a219e41cb@oracle.com">
                        Hi Coleen,<br>
                        <br>
                        <div class="moz-cite-prefix">On 11/15/19 2:12
                          PM, <a class="moz-txt-link-abbreviated"
                            href="mailto:coleen.phillimore@oracle.com"
                            moz-do-not-send="true">coleen.phillimore@oracle.com</a>
                          wrote:<br>
                        </div>
                        <blockquote type="cite"
                          cite="mid:399eb99f-08ba-59e1-2fdc-ba5fc66d9ae5@oracle.com">
                          <br>
                          Hi, I've been working on answers to these
                          questions, so I'll start with this one.<br>
                          <br>
                          The nmethodLocker keeps the nmethod from being
                          reclaimed (made_zombie or memory released) by
                          the sweeper, but the nmethod could be
                          unloaded.  Unloading the nmethod clears the
                          Method* _method field.<br>
                        </blockquote>
                        <br>
                        Yes, I see it is done in the
                        nmethod::make_unloaded().<br>
                        <br>
                        <blockquote type="cite"
                          cite="mid:399eb99f-08ba-59e1-2fdc-ba5fc66d9ae5@oracle.com">
                          The post_compiled_method_load event needs the
                          _method field to look at things like inlining
                          and ScopeDesc fields.   If the nmethod is
                          unloaded, some of the oops are dead.  There
                          are "holder" oops that correspond to the
                          metadata in the nmethod.  If these oops are
                          dead, causing the nmethod to get unloaded,
                          then the metadata may not be valid.<br>
                          <br>
                          So my change 02 looks for a NULL
                          nmethod._method field to tell whether we can
                          post information about the nmethod.<br>
                          <br>
                          There's code in nmethod.cpp like:<br>
                          <br>
                          jmethodID nmethod::get_and_cache_jmethod_id()
                          {<br>
                            if (_jmethod_id == NULL) {<br>
                              // Cache the jmethod_id since it can no
                          longer be looked up once the<br>
                              // method itself has been marked for
                          unloading.<br>
                              _jmethod_id = method()-&gt;jmethod_id();<br>
                            }<br>
                            return _jmethod_id;<br>
                          }<br>
                          <br>
                          Which was added when post_method_load and
                          unload were turned into deferred events.<br>
                        </blockquote>
                        <br>
                        Could we cache the jmethodID in the
                        JvmtiDeferredEvent::compiled_method_load_event<br>
                        similarly as we do in the
                        JvmtiDeferredEvent::compiled_method_unload_event?<br>
                        This would help to get rid of the dependency on
                        the nmethod::_method.<br>
                        Do we depend on any other nmethod fields?<br>
                      </blockquote>
                      <br>
                      Yes, there are other nmethod metadata that we rely
                      on to print inline information, and this function
                      JvmtiCodeBlobEvents::build_jvmti_addr_location_map
                      because it uses the ScopeDesc data in the nmethod.<br>
                    </blockquote>
                    <br>
                    One possible approach is to prepare and cache all
                    this information<br>
                    in the nmethod::post_compiled_method_load_event()
                    before the<br>
                    JvmtiDeferredEvent::compiled_method_load_event() is
                    called.<br>
                    The event parameters are:<br>
                    <pre>typedef struct {
    const void* start_address;
    jlocation location;
} jvmtiAddrLocationMap;</pre>
                    <pre>CompiledMethodLoad(jvmtiEnv *jvmti_env,
            jmethodID method,
            jint code_size,
            const void* code_addr,
            jint map_length,
            const jvmtiAddrLocationMap* map,
            const void* compile_info)</pre>
                    Some of these addresses above could be not
                    accessible when an event is posted.<br>
                    Not sure yet if it is Okay.<br>
                    The question is if this kind of refactoring is worth
                    and right thing to do.<br>
                    <br>
                    <blockquote type="cite"
                      cite="mid:40068f02-812c-b4bc-c150-d12e2fa03f01@oracle.com">
                      <br>
                      We do cache the jmethodID but that's not good
                      enough.  See my last comment in the bug report. 
                      The jmethodID can point to an unloaded method.<br>
                    </blockquote>
                    <br>
                    This looks like it is done a little bit late.<br>
                    It'd better to do it before the event is deferred
                    (see above).<br>
                    <br>
                    <blockquote type="cite"
                      cite="mid:40068f02-812c-b4bc-c150-d12e2fa03f01@oracle.com">
                      I tried a version of keeping the nmethod alive,
                      but the GC folks will hate it.  And it doesn't
                      work and I hate it.<br>
                    </blockquote>
                    <br>
                    From serviceability point of view this is the best
                    and most consistent approach.<br>
                    I seems to me, it was initially designed this way.<br>
                    The downside is it adds some extra complexity to the
                    GC.<br>
                    <br>
                    <blockquote type="cite"
                      cite="mid:40068f02-812c-b4bc-c150-d12e2fa03f01@oracle.com">
                      My version 01 is the best, with the caveat that
                      maybe it should check for _method == NULL instead
                      of nmethod-&gt;is_alive().  I have to talk to Erik
                      to see if there's a race with concurrent class
                      unloading.<br>
                      <br>
                      Any application that depends on a compiled method
                      loading event on a class that could be unloaded is
                      a buggy application.  Applications should not rely
                      on when the JIT compiler decides to compile a
                      method!  This happens to us for a stress test. 
                      Most applications will get most of their compiled
                      method loading events as they normally do.<br>
                    </blockquote>
                    <br>
                    It is not an application that relies on the compiled
                    method loading event.<br>
                    It is about profiling tools to be able to get
                    correct information about what is going on with
                    compilations.<br>
                    My concern is that if we skip such compiled method
                    load events then profilers have no way<br>
                    to find out there many unneeded compilations that
                    are thrown away without any real use.<br>
                    Also, it is not clear what happens with the
                    subsequent compiled method unload events.<br>
                    Are they going to be skipped as well or they can
                    appear and confuse profilers?<br>
                    <br>
                    <br>
                    Thanks,<br>
                    Serguei<br>
                    <blockquote type="cite"
                      cite="mid:40068f02-812c-b4bc-c150-d12e2fa03f01@oracle.com">
                      <br>
                      Thanks,<br>
                      Coleen<br>
                      <br>
                      <blockquote type="cite"
                        cite="mid:4f9f5421-29a6-d191-b03e-093a219e41cb@oracle.com">
                        <br>
                        <br>
                        Thanks,<br>
                        Serguei<br>
                        <br>
                        <blockquote type="cite"
                          cite="mid:399eb99f-08ba-59e1-2fdc-ba5fc66d9ae5@oracle.com">
                          I put more debugging in the bug to show this
                          crash was from an unloaded nmethod.<br>
                          <br>
                          Coleen<br>
                          <br>
                          <br>
                          <div class="moz-cite-prefix">On 11/15/19 4:45
                            PM, <a class="moz-txt-link-abbreviated"
                              href="mailto:serguei.spitsyn@oracle.com"
                              moz-do-not-send="true">serguei.spitsyn@oracle.com</a>
                            wrote:<br>
                          </div>
                          <blockquote type="cite"
                            \
cite="mid:1bce8841-8e23-8702-d2df-6f6c58a01bbf@oracle.com">  Hi Coleen,<br>
                            <br>
                            I have some questions.<br>
                            <br>
                            Both the compiler method load and unload are
                            posted as deferred events.<br>
                            Both events keep the nmethod alive until the
                            ServiceThread processes the event.<br>
                            <br>
                            The implementation is:<br>
                            <br>
                            JvmtiDeferredEvent
                            JvmtiDeferredEvent::compiled_method_load_event(nmethod*
                            nm) {<br>
                              . . .<br>
                              // Keep the nmethod alive until the
                            ServiceThread can process<br>
                              // this deferred event.<br>
                              nmethodLocker::lock_nmethod(nm);<br>
                              return event;<br>
                            }<br>
                            <br>
                            JvmtiDeferredEvent
                            JvmtiDeferredEvent::compiled_method_unload_event(nmethod*
                            nm, jmethodID id, const void* code) {<br>
                              . . .<br>
                              // Keep the nmethod alive until the
                            ServiceThread can process<br>
                              // this deferred event. This will keep the
                            memory for the<br>
                              // generated code from being reused too
                            early. We pass<br>
                              // zombie_ok == true here so that our
                            nmethod that was just<br>
                              // made into a zombie can be locked.<br>
                              nmethodLocker::lock_nmethod(nm, true /*
                            zombie_ok */);<br>
                              return event;<br>
                            }<br>
                            <br>
                            void JvmtiDeferredEvent::post() {<br>
                             
                            \
assert(ServiceThread::is_service_thread(Thread::current()),<br>  "Service thread must \
post enqueued  events");<br>
                              switch(_type) {<br>
                                case TYPE_COMPILED_METHOD_LOAD: {<br>
                                  nmethod* nm =
                            _event_data.compiled_method_load;<br>
                                 
                            JvmtiExport::post_compiled_method_load(nm);<br>
                                  // done with the deferred event so
                            unlock the nmethod<br>
                                  nmethodLocker::unlock_nmethod(nm);<br>
                                  break;<br>
                                }<br>
                                case TYPE_COMPILED_METHOD_UNLOAD: {<br>
                                  nmethod* nm =
                            _event_data.compiled_method_unload.nm;<br>
                                 
                            JvmtiExport::post_compiled_method_unload(<br>
                                   
                            _event_data.compiled_method_unload.method_id,<br>
                                   
                            _event_data.compiled_method_unload.code_begin);<br>
                                  // done with the deferred event so
                            unlock the nmethod<br>
                                  nmethodLocker::unlock_nmethod(nm);<br>
                                  break;<br>
                                }<br>
                                . . .<br>
                              }<br>
                            }<br>
                            <br>
                            Then I wonder how is it possible for the
                            nmethod to be not alive here?:<br>
                            <pre>2168 void \
JvmtiExport::post_compiled_method_load(nmethod *nm) { <span class="new">. . .</span>
<span class="new">2173   // It's not safe to look at metadata for unloaded \
methods.</span> <span class="new">2174   if (!nm-&gt;is_alive()) {</span>
<span class="new">2175     return;</span>
<span class="new">2176   }</span></pre>
                            At least, it lokks like something else is
                            broken.<br>
                            Do I miss something important here?<br>
                            <br>
                            Thanks,<br>
                            Serguei<br>
                            <br>
                            <br>
                            <div class="moz-cite-prefix">On 11/14/19
                              5:15 PM, <a
                                class="moz-txt-link-abbreviated"
                                href="mailto:coleen.phillimore@oracle.com"
                                \
moz-do-not-send="true">coleen.phillimore@oracle.com</a>  wrote:<br>
                            </div>
                            <blockquote type="cite"
                              \
cite="mid:adcc73a3-2647-2a7b-b032-8fe97fe293ab@oracle.com">Summary:  Don't post \
information which uses metadata  from unloaded nmethods <br>
                              <br>
                              Tested tier1-3 and 100 times with test
                              that failed (reproduced failure without
                              the fix). <br>
                              <br>
                              open webrev at <a
                                class="moz-txt-link-freetext"
                                \
                href="http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev"
                                \
moz-do-not-send="true">http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev</a> \
<br>  bug link <a class="moz-txt-link-freetext"
href="https://bugs.openjdk.java.net/browse/JDK-8173361"
                                \
moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8173361</a>  <br>
                              <br>
                              Thanks, <br>
                              Coleen <br>
                            </blockquote>
                            <br>
                          </blockquote>
                          <br>
                        </blockquote>
                        <br>
                      </blockquote>
                      <br>
                    </blockquote>
                    <br>
                  </blockquote>
                  <br>
                </blockquote>
                <br>
              </blockquote>
              <br>
            </blockquote>
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic