[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents
From:       Robbin Ehn <robbin.ehn () oracle ! com>
Date:       2020-03-31 14:20:41
Message-ID: 0a07f87e-ede1-edbd-c754-e7df884e0545 () oracle ! com
[Download RAW message or body]

Thanks for cleaning up thread.hpp!

/Robbin

On 2020-03-30 10:31, Reingruber, Richard wrote:
> Hi,
> 
> this is webrev.5 based on Robbin's feedback and Martin's review - thanks! :)
> 
> The change affects jvmti, hotspot and c2. Partial reviews are very welcome too.
> 
> Full:  http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/
> Delta: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5.inc/
> 
> Robbin, Martin, please let me know, if anything shouldn't be quite as you wanted \
> it. Also find my comments on your feedback below.
> 
> Robbin, can I count you as Reviewer for the runtime part?
> 
> Thanks, Richard.
> 
> --
> 
> > DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
> > You can move both declaration and definition to that file, no need to clobber
> > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> 
> Done.
> 
> > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's own
> > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> 
> I moved JvmtiDeferredUpdates to vframe_hp.hpp where preexisting \
> jvmtiDeferredLocalVariableSet is declared.
> 
> > src/hotspot/share/code/compiledMethod.cpp
> > Nice cleanup!
> 
> Thanks :)
> 
> > src/hotspot/share/code/debugInfoRec.cpp
> > src/hotspot/share/code/debugInfoRec.hpp
> > Additional parmeters. (Remark: I think "non_global_escape_in_scope" would read \
> > better than "not_global_escape_in_scope", but your version is consistent with \
> > existing code, so no change request from my side.) Ok.
> 
> I've been thinking about this too and finally stayed with \
> not_global_escape_in_scope. It's supposed to mean an object whose escape state is \
> not GlobalEscape is in scope. 
> > src/hotspot/share/compiler/compileBroker.cpp
> > src/hotspot/share/compiler/compileBroker.hpp
> > Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a \
> > follow up change together with the test in order to make this webrev smaller, but \
> > since it is included, I'm reviewing everything at once. Not a big deal.) Ok.
> 
> Yes the change would be a little smaller. And if it helps I'll split it off. In \
> general I prefer patches that bring along a suitable amount of tests.
> 
> > src/hotspot/share/opto/c2compiler.cpp
> > Make do_escape_analysis independent of JVMCI capabilities. Nice!
> 
> It is the main goal of the enhancement. It is done for C2, but could be done for \
> JVMCI compilers with just a small effort as well.
> 
> > src/hotspot/share/opto/escape.cpp
> > Annotation for MachSafePointNodes. Your added functionality looks correct.
> > But I'd prefer to move the bulky code out of the large function.
> > I suggest to factor out something like has_not_global_escape and has_arg_escape. \
> > So the code could look like this: SafePointNode* sfn = sfn_worklist.at(next);
> > sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
> > if (sfn->is_CallJava()) {
> > CallJavaNode* call = sfn->as_CallJava();
> > call->set_arg_escape(has_arg_escape(call));
> > }
> > This would also allow us to get rid of the found_..._escape_in_args variables \
> > making the loops better readable.
> 
> Done.
> 
> > It's kind of ugly to use strcmp to recognize uncommon trap, but that seems to be \
> > the way to do it (there are more such places). So it's ok.
> 
> Yeah. I copied the snippet.
> 
> > src/hotspot/share/prims/jvmtiImpl.cpp
> > src/hotspot/share/prims/jvmtiImpl.hpp
> > The sequence is pretty complex:
> > VM_GetOrSetLocal element initialization executes EscapeBarrier code which \
> > suspends the target thread (extra VM Operation).
> 
> Note that the target threads have to be suspended already for VM_GetOrSetLocal*. So \
> it's mainly the synchronization effect of EscapeBarrier::sync_and_suspend_one() \
> that is required here. Also no extra _handshake_ is executed, since \
> sync_and_suspend_one() will find the target threads already suspended.
> 
> > VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread to \
> > prepare VM Operation with frame deoptimization). VM_GetOrSetLocal destructor \
> > implicitly calls EscapeBarrier destructor which resumes the target thread. But I \
> > don't have any improvement proposal. Performance is probably not a concern, here. \
> > So it's ok.
> 
> > VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has \
> > non-globally escaping objects and other frames if they have arg escaping ones. \
> > Good.
> 
> It's not specifically the top frame, but the frame that is accessed.
> 
> > src/hotspot/share/runtime/deoptimization.cpp
> > Object deoptimization. I have more comments and proposals, here.
> > First of all, handling recursive and waiting locks in relock_objects is tricky, \
> > but looks correct. Comments are sufficient to understand why things are done as \
> > they are implemented.
> 
> > BiasedLocking related parts are complex, but we may get rid of them in the future \
> > (with BiasedLocking removal). Anyway, looks correct, too.
> 
> > Typo in comment: "regularily" => "regularly"
> 
> > Deoptimization::fetch_unroll_info_helper is the only place where \
> > _jvmti_deferred_updates get deallocated (except JavaThread destructor). But I \
> > think we always go through it, so I can't see a memory leak or such kind of \
> > issues.
> 
> That's correct. The compiled frame for which deferred updates are allocated is \
> always deoptimized before (see EscapeBarrier::deoptimize_objects()). This is also \
> asserted in compiledVFrame::update_deferred_value(). I've added the same assertion \
> to Deoptimization::relock_objects(). So we can be sure that _jvmti_deferred_updates \
> are deallocated again in fetch_unroll_info_helper().
> 
> > EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread().
> 
> Sure, well spotted!
> 
> > You can use MutexLocker and MonitorLocker with Thread* to save the \
> > Thread::current() call.
> 
> Right, good hint. This was recently introduced with 8235678. I even had to resolve \
> conflicts. Should have done this then.
> 
> > I'd make set_objs_are_deoptimized static and remove it from the EscapeBarrier \
> > interface because I think it shouldn't be used outside of \
> > EscapeBarrier::deoptimize_objects.
> 
> Done.
> 
> > Typo in comment: "we must only deoptimize" => "we only have to deoptimize"
> 
> Replaced with "[...] we deoptimize iff local objects are passed as args"
> 
> > "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and \
> > barrier_active() is redundant. Implementation can get moved to hpp file.
> 
> Ok. Done.
> 
> > I'll get back to suspend flags, later.
> 
> > There are weird cases regarding _self_deoptimization_in_progress.
> > Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C can \
> > set _self_deoptimization_in_progress while A performs the handshake for \
> > suspending C. I think this doesn't lead to errors, but it's probably not desired. \
> > I think it would be better to use only one "wait" call in sync_and_suspend_one \
> > and sync_and_suspend_all.
> 
> You're right. We've discussed that face-to-face, but couldn't find a real issue. \
> But now, thinking again, a reckon I found one: 
> 2808   // Sync with other threads that might be doing deoptimizations
> 2809   {
> 2810     // Need to switch to _thread_blocked for the wait() call
> 2811     ThreadBlockInVM tbivm(_calling_thread);
> 2812     MonitorLocker ml(EscapeBarrier_lock, Mutex::_no_safepoint_check_flag);
> 2813     while (_self_deoptimization_in_progress) {
> 2814       ml.wait();
> 2815     }
> 2816
> 2817     if (self_deopt()) {
> 2818       _self_deoptimization_in_progress = true;
> 2819     }
> 2820
> 2821     while (_deoptee_thread->is_ea_obj_deopt_suspend()) {
> 2822       ml.wait();
> 2823     }
> 2824
> 2825     if (self_deopt()) {
> 2826       return;
> 2827     }
> 2828
> 2829     // set suspend flag for target thread
> 2830     _deoptee_thread->set_ea_obj_deopt_flag();
> 2831   }
> 
> - A waits in 2822
> - C is suspended
> - B notifies all in resume_one()
> - A and C wake up
> - C wins over A and sets _self_deoptimization_in_progress = true in 2818
> - C does the self deoptimization
> - A executes 2830 _deoptee_thread->set_ea_obj_deopt_flag()
> 
> C will self suspend at some undefined point. The resulting state is illegal.
> 
> > I first thought it'd be better to move ThreadBlockInVM before wait() to reduce \
> > thread state transitions, but that seems to be problematic because \
> > ThreadBlockInVM destructor contains a safepoint check which we shouldn't do while \
> > holding EscapeBarrier_lock. So no change request.
> 
> Yes, would be nice to have the state change only if needed, but for the reason you \
> mentioned it is not quite as easy as it seems to be. I experimented as well with a \
> second lock, but did not succeed. 
> > Change in thred_added:
> > I think the sequence would be more comprehensive if we waited for \
> > deopt_all_threads in Thread::start and all other places where a new thread can \
> > run into Java code (e.g. JVMTI attach). Your version makes new threads come up \
> > with suspend flag set. That looks correct, too. Advantage is that you only have \
> > to change one place (thread_added). It'll be interesting to see how it will look \
> > like when we use async handshakes instead of suspend flags. For now, I'm ok with \
> > your version.
> 
> I had a version that did what you are suggesting. The current version also has the \
> advantage, that there are fewer places where a thread has to wait for ongoing \
> object deoptimization. This means viewer places where you have to worry about \
> correct thread state transitions, possible deadlocks, and if all oops are properly \
> Handle'ed. 
> > I'd only move MutexLocker ml(EscapeBarrier_lock...) after if \
> > (!jt->is_hidden_from_external_view()).
> 
> Done.
> 
> > Having 4 different deoptimize_objects functions makes it a little hard to keep an \
> > overview of which one is used for what. Maybe adding suffixes would help a little \
> > bit, but I can also live with what you have. Implementation looks correct to me.
> 
> 2 are internal. I added the suffix _internal to them. This leaves 2 to choose from.
> 
> > src/hotspot/share/runtime/deoptimization.hpp
> > Escape barriers and object deoptimization functions.
> > Typo in comment: "helt" => "held"
> 
> Done in place already.
> 
> > src/hotspot/share/runtime/interfaceSupport.cpp
> > InterfaceSupport::deoptimizeAllObjects() is only used for DeoptimizeObjectsALot = \
> > 1. I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad \
> > to have DeoptimizeObjectsALot = 1 in addition. Ok.
> 
> I never used DeoptimizeObjectsALot = 1 that much. It could be more deterministic in \
> single threaded scenarios. I wouldn't object to get rid of it though.
> 
> > src/hotspot/share/runtime/stackValue.hpp
> > Better reinitilization in StackValue. Good.
> 
> StackValue::obj_is_scalar_replaced() should not return true after calling \
> set_obj(). 
> > src/hotspot/share/runtime/thread.cpp
> > src/hotspot/share/runtime/thread.hpp
> > src/hotspot/share/runtime/thread.inline.hpp
> > wait_for_object_deoptimization, suspend flag, deferred updates and test feature \
> > to deoptimize objects.
> 
> > In the long term, we want to get rid of suspend flags, so it's not so nice to \
> > introduce a new one. But I agree with Götz that it should be acceptable as \
> > temporary solution until async handshakes are available (which takes more time). \
> > So I'm ok with your change.
> 
> I'm keen to build the feature on async handshakes when the arive.
> 
> > You can use MutexLocker with Thread*.
> 
> Done.
> 
> > JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out of \
> > thread.hpp.
> 
> Done.
> 
> > src/hotspot/share/runtime/vframe.cpp
> > Added support for entry frame to new_vframe. Ok.
> 
> 
> > src/hotspot/share/runtime/vframe_hp.cpp
> > src/hotspot/share/runtime/vframe_hp.hpp
> 
> > I think code()->as_nmethod() in not_global_escape_in_scope() and arg_escape() \
> > should better be under #ifdef ASSERT or inside the assert statement (no need for \
> > code cache walking in product build).
> 
> Done.
> 
> > jvmtiDeferredLocalVariableSet::update_monitors:
> > Please add a comment explaining that owner referenced by original info may be \
> > scalar replaced, but it is deoptimized in the vframe.
> 
> Done.
> 
> -----Original Message-----
> From: Doerr, Martin <martin.doerr@sap.com>
> Sent: Donnerstag, 12. März 2020 17:28
> To: Reingruber, Richard <richard.reingruber@sap.com>; 'Robbin Ehn' \
> <robbin.ehn@oracle.com>; Lindenmaier, Goetz <goetz.lindenmaier@sap.com>; David \
> Holmes <david.holmes@oracle.com>; Vladimir Kozlov (vladimir.kozlov@oracle.com) \
> <vladimir.kozlov@oracle.com>; serviceability-dev@openjdk.java.net; \
>                 hotspot-compiler-dev@openjdk.java.net; \
>                 hotspot-runtime-dev@openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the \
> Presence of JVMTI Agents 
> Hi Richard,
> 
> 
> I managed to find time for a (almost) complete review of webrev.4. (I'll review the \
> tests separately.) 
> First of all, the change seems to be in pretty good quality for its significant \
> complexity. I couldn't find any real bugs. But I'd like to propose minor \
> improvements. I'm convinced that it's mature because we did substantial testing.
> 
> I like the new functionality for object deoptimization. It can possibly be reused \
> for future escape analysis based optimizations. So I appreciate having it available \
> in the code base. In addition to that, your change makes the JVMTI implementation \
> better integrated into the VM. 
> 
> Now to the details:
> 
> 
> src/hotspot/share/c1/c1_IR.hpp
> describe_scope parameters. Ok.
> 
> 
> src/hotspot/share/ci/ciEnv.cpp
> src/hotspot/share/ci/ciEnv.hpp
> Fix for JvmtiExport::can_walk_any_space() capability. Ok.
> 
> 
> src/hotspot/share/code/compiledMethod.cpp
> Nice cleanup!
> 
> 
> src/hotspot/share/code/debugInfoRec.cpp
> src/hotspot/share/code/debugInfoRec.hpp
> Additional parmeters. (Remark: I think "non_global_escape_in_scope" would read \
> better than "not_global_escape_in_scope", but your version is consistent with \
> existing code, so no change request from my side.) Ok. 
> 
> src/hotspot/share/code/nmethod.cpp
> Nice cleanup!
> 
> 
> src/hotspot/share/code/pcDesc.hpp
> Additional parameters. Ok.
> 
> 
> src/hotspot/share/code/scopeDesc.cpp
> src/hotspot/share/code/scopeDesc.hpp
> Improved implementation + additional parameters. Ok.
> 
> 
> src/hotspot/share/compiler/compileBroker.cpp
> src/hotspot/share/compiler/compileBroker.hpp
> Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a follow \
> up change together with the test in order to make this webrev smaller, but since it \
> is included, I'm reviewing everything at once. Not a big deal.) Ok. 
> 
> src/hotspot/share/jvmci/jvmciCodeInstaller.cpp
> Additional parameters. Ok.
> 
> 
> src/hotspot/share/opto/c2compiler.cpp
> Make do_escape_analysis independent of JVMCI capabilities. Nice!
> 
> 
> src/hotspot/share/opto/callnode.hpp
> Additional fields for MachSafePointNodes. Ok.
> 
> 
> src/hotspot/share/opto/escape.cpp
> Annotation for MachSafePointNodes. Your added functionality looks correct.
> But I'd prefer to move the bulky code out of the large function.
> I suggest to factor out something like has_not_global_escape and has_arg_escape. So \
> the code could look like this: SafePointNode* sfn = sfn_worklist.at(next);
> sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
> if (sfn->is_CallJava()) {
> CallJavaNode* call = sfn->as_CallJava();
> call->set_arg_escape(has_arg_escape(call));
> }
> This would also allow us to get rid of the found_..._escape_in_args variables \
> making the loops better readable. 
> It's kind of ugly to use strcmp to recognize uncommon trap, but that seems to be \
> the way to do it (there are more such places). So it's ok. 
> 
> src/hotspot/share/opto/machnode.hpp
> Additional fields for MachSafePointNodes. Ok.
> 
> 
> src/hotspot/share/opto/macro.cpp
> Allow elimination of non-escaping allocations. Ok.
> 
> 
> src/hotspot/share/opto/matcher.cpp
> src/hotspot/share/opto/output.cpp
> Copy attribute / pass parameters. Ok.
> 
> 
> src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp
> Nice cleanup!
> 
> 
> src/hotspot/share/prims/jvmtiEnv.cpp
> src/hotspot/share/prims/jvmtiEnvBase.cpp
> Escape barriers + deoptimize objects for target thread. Good.
> 
> 
> src/hotspot/share/prims/jvmtiImpl.cpp
> src/hotspot/share/prims/jvmtiImpl.hpp
> The sequence is pretty complex:
> VM_GetOrSetLocal element initialization executes EscapeBarrier code which suspends \
> the target thread (extra VM Operation). VM_GetOrSetLocal::doit_prologue performs \
> object deoptimization (by VM Thread to prepare VM Operation with frame \
> deoptimization). VM_GetOrSetLocal destructor implicitly calls EscapeBarrier \
> destructor which resumes the target thread. But I don't have any improvement \
> proposal. Performance is probably not a concern, here. So it's ok. 
> VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has \
> non-globally escaping objects and other frames if they have arg escaping ones. \
> Good. 
> 
> src/hotspot/share/prims/jvmtiTagMap.cpp
> Escape barriers + deoptimize objects for all threads. Ok.
> 
> 
> src/hotspot/share/prims/whitebox.cpp
> Added WB_IsFrameDeoptimized to API. Ok.
> 
> 
> src/hotspot/share/runtime/deoptimization.cpp
> Object deoptimization. I have more comments and proposals, here.
> First of all, handling recursive and waiting locks in relock_objects is tricky, but \
> looks correct. Comments are sufficient to understand why things are done as they \
> are implemented. 
> BiasedLocking related parts are complex, but we may get rid of them in the future \
> (with BiasedLocking removal). Anyway, looks correct, too.
> 
> Typo in comment: "regularily" => "regularly"
> 
> Deoptimization::fetch_unroll_info_helper is the only place where \
> _jvmti_deferred_updates get deallocated (except JavaThread destructor). But I think \
> we always go through it, so I can't see a memory leak or such kind of issues. 
> EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread().
> 
> You can use MutexLocker and MonitorLocker with Thread* to save the \
> Thread::current() call. 
> I'd make set_objs_are_deoptimized static and remove it from the EscapeBarrier \
> interface because I think it shouldn't be used outside of \
> EscapeBarrier::deoptimize_objects. 
> Typo in comment: "we must only deoptimize" => "we only have to deoptimize"
> 
> "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and \
> barrier_active() is redundant. Implementation can get moved to hpp file. 
> I'll get back to suspend flags, later.
> 
> There are weird cases regarding _self_deoptimization_in_progress.
> Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C can set \
> _self_deoptimization_in_progress while A performs the handshake for suspending C. I \
> think this doesn't lead to errors, but it's probably not desired. I think it would \
> be better to use only one "wait" call in sync_and_suspend_one and \
> sync_and_suspend_all. 
> I first thought it'd be better to move ThreadBlockInVM before wait() to reduce \
> thread state transitions, but that seems to be problematic because ThreadBlockInVM \
> destructor contains a safepoint check which we shouldn't do while holding \
> EscapeBarrier_lock. So no change request. 
> Change in thred_added:
> I think the sequence would be more comprehensive if we waited for deopt_all_threads \
> in Thread::start and all other places where a new thread can run into Java code \
> (e.g. JVMTI attach). Your version makes new threads come up with suspend flag set. \
> That looks correct, too. Advantage is that you only have to change one place \
> (thread_added). It'll be interesting to see how it will look like when we use async \
> handshakes instead of suspend flags. For now, I'm ok with your version.
> 
> I'd only move MutexLocker ml(EscapeBarrier_lock...) after if \
> (!jt->is_hidden_from_external_view()). 
> Having 4 different deoptimize_objects functions makes it a little hard to keep an \
> overview of which one is used for what. Maybe adding suffixes would help a little \
> bit, but I can also live with what you have. Implementation looks correct to me.
> 
> 
> src/hotspot/share/runtime/deoptimization.hpp
> Escape barriers and object deoptimization functions.
> Typo in comment: "helt" => "held"
> 
> 
> src/hotspot/share/runtime/globals.hpp
> Addition of develop flag DeoptimizeObjectsALotInterval. Ok.
> 
> 
> src/hotspot/share/runtime/interfaceSupport.cpp
> InterfaceSupport::deoptimizeAllObjects() is only used for DeoptimizeObjectsALot = \
> 1. I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad to \
> have DeoptimizeObjectsALot = 1 in addition. Ok. 
> 
> src/hotspot/share/runtime/interfaceSupport.inline.hpp
> Addition of deoptimizeAllObjects. Ok.
> 
> 
> src/hotspot/share/runtime/mutexLocker.cpp
> src/hotspot/share/runtime/mutexLocker.hpp
> Addition of EscapeBarrier_lock. Ok.
> 
> 
> src/hotspot/share/runtime/objectMonitor.cpp
> Make recursion count relock aware. Ok.
> 
> 
> src/hotspot/share/runtime/stackValue.hpp
> Better reinitilization in StackValue. Good.
> 
> 
> src/hotspot/share/runtime/thread.cpp
> src/hotspot/share/runtime/thread.hpp
> src/hotspot/share/runtime/thread.inline.hpp
> wait_for_object_deoptimization, suspend flag, deferred updates and test feature to \
> deoptimize objects. 
> In the long term, we want to get rid of suspend flags, so it's not so nice to \
> introduce a new one. But I agree with Götz that it should be acceptable as \
> temporary solution until async handshakes are available (which takes more time). So \
> I'm ok with your change. 
> You can use MutexLocker with Thread*.
> 
> JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out of \
> thread.hpp. 
> 
> src/hotspot/share/runtime/vframe.cpp
> Added support for entry frame to new_vframe. Ok.
> 
> 
> src/hotspot/share/runtime/vframe_hp.cpp
> src/hotspot/share/runtime/vframe_hp.hpp
> 
> I think code()->as_nmethod() in not_global_escape_in_scope() and arg_escape() \
> should better be under #ifdef ASSERT or inside the assert statement (no need for \
> code cache walking in product build). 
> jvmtiDeferredLocalVariableSet::update_monitors:
> Please add a comment explaining that owner referenced by original info may be \
> scalar replaced, but it is deoptimized in the vframe. 
> 
> src/hotspot/share/utilities/macros.hpp
> Addition of NOT_COMPILER2_OR_JVMCI_RETURN macros. Ok.
> 
> 
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysisEnabled.java
>  test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnalysisEnabled.c
>  New test. Will review separately.
> 
> 
> test/jdk/TEST.ROOT
> Addition of vm.jvmci as required property. Ok.
> 
> 
> test/jdk/com/sun/jdi/EATests.java
> test/jdk/com/sun/jdi/EATestsJVMCI.java
> New test. Will review separately.
> 
> 
> test/lib/sun/hotspot/WhiteBox.java
> Added isFrameDeoptimized to API. Ok.
> 
> 
> That was it. Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces@openjdk.java.net> On Behalf Of Reingruber, Richard
> > Sent: Dienstag, 3. März 2020 21:23
> > To: 'Robbin Ehn' <robbin.ehn@oracle.com>; Lindenmaier, Goetz
> > <goetz.lindenmaier@sap.com>; David Holmes <david.holmes@oracle.com>;
> > Vladimir Kozlov (vladimir.kozlov@oracle.com)
> > <vladimir.kozlov@oracle.com>; serviceability-dev@openjdk.java.net;
> > hotspot-compiler-dev@openjdk.java.net; hotspot-runtime-
> > dev@openjdk.java.net
> > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better
> > Performance in the Presence of JVMTI Agents
> > 
> > Hi Robbin,
> > 
> > > > I understand that Robbin proposed to replace the usage of
> > > > _suspend_flag with handshakes. Apparently, async handshakes
> > > > are needed to do so. We have been waiting a while for removal
> > > > of the _suspend_flag / introduction of async handshakes [2].
> > > > What is the status here?
> > 
> > > I have an old prototype which I would like to continue to work on.
> > > So do not assume asynch handshakes will make 15.
> > > Even if it would, I think there are a lot more investigate work to remove
> > > _suspend_flag.
> > 
> > Let us know, if we can be of any help to you and be it only testing.
> > 
> > > > > Full:
> > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
> > 
> > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
> > > You can move both declaration and definition to that file, no need to
> > clobber
> > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> > 
> > Will do.
> > 
> > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's
> > own
> > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> > 
> > You are right. It shouldn't be declared in thread.hpp. I will look into that.
> > 
> > > Note that we also think we may have a bug in deopt:
> > > https://bugs.openjdk.java.net/browse/JDK-8238237
> > 
> > > I think it would be best, if possible, to push after that is resolved.
> > 
> > Sure.
> > 
> > > Not even nearly a full review :)
> > 
> > I know :)
> > 
> > Anyways, thanks a lot,
> > Richard.
> > 
> > 
> > -----Original Message-----
> > From: Robbin Ehn <robbin.ehn@oracle.com>
> > Sent: Monday, March 2, 2020 11:17 AM
> > To: Lindenmaier, Goetz <goetz.lindenmaier@sap.com>; Reingruber, Richard
> > <richard.reingruber@sap.com>; David Holmes <david.holmes@oracle.com>;
> > Vladimir Kozlov (vladimir.kozlov@oracle.com)
> > <vladimir.kozlov@oracle.com>; serviceability-dev@openjdk.java.net;
> > hotspot-compiler-dev@openjdk.java.net; hotspot-runtime-
> > dev@openjdk.java.net
> > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> > in the Presence of JVMTI Agents
> > 
> > Hi,
> > 
> > On 2/24/20 5:39 PM, Lindenmaier, Goetz wrote:
> > > Hi,
> > > 
> > > I had a look at the progress of this change. Nothing
> > > happened since Richard posted his update using more
> > > handshakes [1].
> > > But we (SAP) would appreciate a lot if this change could
> > > be successfully reviewed and pushed.
> > > 
> > > I think there is basic understanding that this
> > > change is helpful. It fixes a number of issues with JVMTI,
> > > and will deliver the same performance benefits as EA
> > > does in current production mode for debugging scenarios.
> > > 
> > > This is important for us as we run our VMs prepared
> > > for debugging in production mode.
> > > 
> > > I understand that Robbin proposed to replace the usage of
> > > _suspend_flag with handshakes. Apparently, async handshakes
> > > are needed to do so. We have been waiting a while for removal
> > > of the _suspend_flag / introduction of async handshakes [2].
> > > What is the status here?
> > 
> > I have an old prototype which I would like to continue to work on.
> > So do not assume asynch handshakes will make 15.
> > Even if it would, I think there are a lot more investigate work to remove
> > _suspend_flag.
> > 
> > > 
> > > I think we should no longer wait, but proceed with
> > > this change. We will look into removing the usage of
> > > suspend_flag introduced here once it is possible to implement
> > > it with handshakes.
> > 
> > Yes, sure.
> > 
> > > > Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
> > 
> > DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
> > You can move both declaration and definition to that file, no need to clobber
> > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> > 
> > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's
> > own
> > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> > 
> > Note that we also think we may have a bug in deopt:
> > https://bugs.openjdk.java.net/browse/JDK-8238237
> > 
> > I think it would be best, if possible, to push after that is resolved.
> > 
> > Not even nearly a full review :)
> > 
> > Thanks, Robbin
> > 
> > 
> > > > Incremental:
> > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4.inc/
> > > > 
> > > > I was not able to eliminate the additional suspend flag now. I'll take care
> > of this
> > > > as soon as the
> > > > existing suspend-resume-mechanism is reworked.
> > > > 
> > > > Testing:
> > > > 
> > > > Nightly tests @SAP:
> > > > 
> > > > JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
> > Renaissance
> > > > Suite, SAP specific tests
> > > > with fastdebug and release builds on all platforms
> > > > 
> > > > Stress testing with DeoptimizeObjectsALot running SPECjvm2008 40x
> > parallel
> > > > for 24h
> > > > 
> > > > Thanks, Richard.
> > > > 
> > > > 
> > > > More details on the changes:
> > > > 
> > > > * Hide DeoptimizeObjectsALotThread from external view.
> > > > 
> > > > * Changed EscapeBarrier_lock to be a _safepoint_check_never lock.
> > > > It used to be _safepoint_check_sometimes, which will be eliminated
> > sooner or
> > > > later.
> > > > I added explicit thread state changes with ThreadBlockInVM to code
> > paths
> > > > where we can wait()
> > > > on EscapeBarrier_lock to become safepoint safe.
> > > > 
> > > > * Use handshake EscapeBarrierSuspendHandshake to suspend target
> > threads
> > > > instead of vm operation
> > > > VM_ThreadSuspendAllForObjDeopt.
> > > > 
> > > > * Removed uses of Threads_lock. When adding a new thread we suspend
> > it iff
> > > > EA optimizations are
> > > > being reverted. In the previous version we were waiting on
> > Threads_lock
> > > > while EA optimizations
> > > > were reverted. See EscapeBarrier::thread_added().
> > > > 
> > > > * Made tests require Xmixed compilation mode.
> > > > 
> > > > * Made tests agnostic regarding tiered compilation.
> > > > I.e. tc isn't disabled anymore, and the tests can be run with tc enabled or
> > > > disabled.
> > > > 
> > > > * Exercising EATests.java as well with stress test options
> > > > DeoptimizeObjectsALot*
> > > > Due to the non-deterministic deoptimizations some tests need to be
> > skipped.
> > > > We do this to prevent bit-rot of the stress test code.
> > > > 
> > > > * Executing EATests.java as well with graal if available. Driver for this is
> > > > EATestsJVMCI.java. Graal cannot pass all tests, because it does not
> > provide all
> > > > the new debug info
> > > > (namely not_global_escape_in_scope and arg_escape in
> > scopeDesc.hpp).
> > > > And graal does not yet support the JVMTI operations force early return
> > and
> > > > pop frame.
> > > > 
> > > > * Removed tracing from new jdi tests in EATests.java. Too much trace
> > output
> > > > before the debugging
> > > > connection is established can cause deadlock because output buffers fill
> > up.
> > > > (See https://bugs.openjdk.java.net/browse/JDK-8173304)
> > > > 
> > > > * Many copyright year changes and smaller clean-up changes of testing
> > code
> > > > (trailing white-space and
> > > > the like).
> > > > 
> > > > 
> > > > -----Original Message-----
> > > > From: David Holmes <david.holmes@oracle.com>
> > > > Sent: Donnerstag, 19. Dezember 2019 03:12
> > > > To: Reingruber, Richard <richard.reingruber@sap.com>; serviceability-
> > > > dev@openjdk.java.net; hotspot-compiler-dev@openjdk.java.net;
> > hotspot-
> > > > runtime-dev@openjdk.java.net; Vladimir Kozlov
> > (vladimir.kozlov@oracle.com)
> > > > <vladimir.kozlov@oracle.com>
> > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > Performance in
> > > > the Presence of JVMTI Agents
> > > > 
> > > > Hi Richard,
> > > > 
> > > > I think my issue is with the way EliminateNestedLocks works so I'm going
> > > > to look into that more deeply.
> > > > 
> > > > Thanks for the explanations.
> > > > 
> > > > David
> > > > 
> > > > On 18/12/2019 12:47 am, Reingruber, Richard wrote:
> > > > > Hi David,
> > > > > 
> > > > > > > > Some further queries/concerns:
> > > > > > > > 
> > > > > > > > src/hotspot/share/runtime/objectMonitor.cpp
> > > > > > > > 
> > > > > > > > Can you please explain the changes to ObjectMonitor::wait:
> > > > > > > > 
> > > > > > > > !   _recursions = save      // restore the old recursion count
> > > > > > > > !                 + jt->get_and_reset_relock_count_after_wait(); //
> > > > > > > > increased by the deferred relock count
> > > > > > > > 
> > > > > > > > what is the "deferred relock count"? I gather it relates to
> > > > > > > > 
> > > > > > > > "The code was extended to be able to deoptimize objects of a
> > > > > > > frame that
> > > > > > > > is not the top frame and to let another thread than the owning
> > > > > > > thread do
> > > > > > > > it."
> > > > > > > 
> > > > > > > Yes, these relate. Currently EA based optimizations are reverted,
> > when a
> > > > compiled frame is
> > > > > > > replaced with corresponding interpreter frames. Part of this is
> > relocking
> > > > objects with eliminated
> > > > > > > locking. New with the enhancement is that we do this also just
> > before
> > > > object references are
> > > > > > > acquired through JVMTI. In this case we deoptimize also the
> > owning
> > > > compiled frame C and we
> > > > > > > register deoptimized objects as deferred updates. When control
> > returns
> > > > to C it gets deoptimized,
> > > > > > > we notice that objects are already deoptimized (reallocated and
> > > > relocked), so we don't do it again
> > > > > > > (relocking twice would be incorrect of course). Deferred updates
> > are
> > > > copied into the new
> > > > > > > interpreter frames.
> > > > > > > 
> > > > > > > Problem: relocking is not possible if the target thread T is waiting
> > on the
> > > > monitor that needs to
> > > > > > > be relocked. This happens only with non-local objects with
> > > > EliminateNestedLocks. Instead relocking
> > > > > > > is deferred until T owns the monitor again. This is what the piece of
> > > > code above does.
> > > > > > 
> > > > > > Sorry I need some more detail here. How can you wait() on an
> > object
> > > > > > monitor if the object allocation and/or locking was optimised away?
> > And
> > > > > > what is a "non-local object" in this context? Isn't EA restricted to
> > > > > > thread-confined objects?
> > > > > 
> > > > > "Non-local object" is an object that escapes its thread. The issue I'm
> > > > addressing with the changes
> > > > > in ObjectMonitor::wait are almost unrelated to EA. They are caused by
> > > > EliminateNestedLocks, where C2
> > > > > eliminates recursive locking of an already owned lock. The lock owning
> > object
> > > > exists on the heap, it
> > > > > is locked and you can call wait() on it.
> > > > > 
> > > > > EliminateLocks is the C2 option that controls lock elimination based on
> > EA.
> > > > Both optimizations have
> > > > > in common that objects with eliminated locking need to be relocked
> > when
> > > > deoptimizing a frame,
> > > > > i.e. when replacing a compiled frame with equivalent interpreter
> > > > > frames. Deoptimization::relock_objects does that job for /all/ eliminated
> > > > locks in scope. /All/ can
> > > > > be a mix of eliminated nested locks and locks of not-escaping objects.
> > > > > 
> > > > > New with the enhancement: I call relock_objects earlier, just before
> > objects
> > > > pontentially
> > > > > escape. But then later when the owning compiled frame gets
> > deoptimized, I
> > > > must not do it again:
> > > > > 
> > > > > See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp:
> > > > > 
> > > > > 373   if ((jvmci_enabled || ((DoEscapeAnalysis ||
> > EliminateNestedLocks) &&
> > > > EliminateLocks))
> > > > > 374       && !EscapeBarrier::objs_are_deoptimized(thread,
> > deoptee.id())) {
> > > > > 375     bool unused;
> > > > > 376     eliminate_locks(thread, chunk, realloc_failures, deoptee,
> > exec_mode,
> > > > unused);
> > > > > 377   }
> > > > > 
> > > > > Now when calling relock_objects early it is quiet possible that I have to
> > relock
> > > > an object the
> > > > > target thread currently waits for. Obviously I cannot relock in this case,
> > > > instead I chose to
> > > > > introduce relock_count_after_wait to JavaThread.
> > > > > 
> > > > > > Is it just that some of the locking gets optimized away e.g.
> > > > > > 
> > > > > > synchronised(obj) {
> > > > > > synchronised(obj) {
> > > > > > synchronised(obj) {
> > > > > > obj.wait();
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > > 
> > > > > > If this is reduced to a form as-if it were a single lock of the monitor
> > > > > > (due to EA) and the wait() triggers a JVM TI event which leads to the
> > > > > > escape of "obj" then we need to reconstruct the true lock state, and
> > so
> > > > > > when the wait() internally unblocks and reacquires the monitor it
> > has to
> > > > > > set the true recursion count to 3, not the 1 that it appeared to be
> > when
> > > > > > wait() was initially called. Is that the scenario?
> > > > > 
> > > > > Kind of... except that the locking is not eliminated due to EA and there is
> > no
> > > > JVM TI event
> > > > > triggered by wait.
> > > > > 
> > > > > Add
> > > > > 
> > > > > LocalObject l1 = new LocalObject();
> > > > > 
> > > > > in front of the synchrnized blocks and assume a JVM TI agent acquires l1.
> > This
> > > > triggers the code in
> > > > > question.
> > > > > 
> > > > > See that relocking/reallocating is transactional. If it is done then for \
> > > > > /all/
> > > > objects in scope and it is
> > > > > done at most once. It wouldn't be quite so easy to split this in relocking
> > of
> > > > nested/EA-based
> > > > > eliminated locks.
> > > > > 
> > > > > > If so I find this truly awful. Anyone using wait() in a realistic form
> > > > > > requires a notification and so the object cannot be thread confined.
> > In
> > > > > 
> > > > > It is not thread confined.
> > > > > 
> > > > > > which case I would strongly argue that upon hitting the wait() the
> > deopt
> > > > > > should occur unconditionally and so the lock state is correct before
> > we
> > > > > > wait and so we don't need to mess with the recursion count
> > internally
> > > > > > when we reacquire the monitor.
> > > > > > 
> > > > > > > 
> > > > > > > > which I don't like the sound of at all when it comes to
> > ObjectMonitor
> > > > > > > > state. So I'd like to understand in detail exactly what is going on
> > here
> > > > > > > > and why.  This is a very intrusive change that seems to badly
> > break
> > > > > > > > encapsulation and impacts future changes to ObjectMonitor
> > that are
> > > > under
> > > > > > > > investigation.
> > > > > > > 
> > > > > > > I would not regard this as breaking encapsulation. Certainly not
> > badly.
> > > > > > > 
> > > > > > > I've added a property relock_count_after_wait to JavaThread. The
> > > > property is well
> > > > > > > encapsulated. Future ObjectMonitor implementations have to deal
> > with
> > > > recursion too. They are free
> > > > > > > in choosing a way to do that as long as that property is taken into
> > > > account. This is hardly a
> > > > > > > limitation.
> > > > > > 
> > > > > > I do think this badly breaks encapsulation as you have to add a
> > callout
> > > > > > from the guts of the ObjectMonitor code to reach into the thread to
> > get
> > > > > > this lock count adjustment. I understand why you have had to do
> > this but
> > > > > > I would much rather see a change to the EA optimisation strategy so
> > that
> > > > > > this is not needed.
> > > > > > 
> > > > > > > Note also that the property is a straight forward extension of the
> > > > existing concept of deferred
> > > > > > > local updates. It is embedded into the structure holding them. So
> > not
> > > > even the footprint of a
> > > > > > > JavaThread is enlarged if no deferred updates are generated.
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > 
> > > > > > > I'm actually duplicating the existing external suspend mechanism,
> > > > because a thread can be
> > > > > > > suspended at most once. And hey, and don't like that either! But it
> > > > seems not unlikely that the
> > > > > > > duplicate can be removed together with the original and the new
> > type
> > > > of handshakes that will be
> > > > > > > used for thread suspend can be used for object deoptimization
> > too. See
> > > > today's discussion in
> > > > > > > JDK-8227745 [2].
> > > > > > 
> > > > > > I hope that discussion bears some fruit, at the moment it seems not
> > to
> > > > > > be possible to use handshakes here. :(
> > > > > > 
> > > > > > The external suspend mechanism is a royal pain in the proverbial
> > that we
> > > > > > have to carefully live with. The idea that we're duplicating that for
> > > > > > use in another fringe area of functionality does not thrill me at all.
> > > > > > 
> > > > > > To be clear, I understand the problem that exists and that you wish
> > to
> > > > > > solve, but for the runtime parts I balk at the complexity cost of
> > > > > > solving it.
> > > > > 
> > > > > I know it's complex, but by far no rocket science.
> > > > > 
> > > > > Also I find it hard to imagine another fix for JDK-8233915 besides
> > changing
> > > > the JVM TI specification.
> > > > > 
> > > > > Thanks, Richard.
> > > > > 
> > > > > -----Original Message-----
> > > > > From: David Holmes <david.holmes@oracle.com>
> > > > > Sent: Dienstag, 17. Dezember 2019 08:03
> > > > > To: Reingruber, Richard <richard.reingruber@sap.com>; serviceability-
> > > > dev@openjdk.java.net; hotspot-compiler-dev@openjdk.java.net;
> > hotspot-
> > > > runtime-dev@openjdk.java.net; Vladimir Kozlov
> > (vladimir.kozlov@oracle.com)
> > > > <vladimir.kozlov@oracle.com>
> > > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > Performance
> > > > in the Presence of JVMTI Agents
> > > > > 
> > > > > <resend as my mailer crashed during last send>
> > > > > 
> > > > > David
> > > > > 
> > > > > On 17/12/2019 4:57 pm, David Holmes wrote:
> > > > > > Hi Richard,
> > > > > > 
> > > > > > On 14/12/2019 5:01 am, Reingruber, Richard wrote:
> > > > > > > Hi David,
> > > > > > > 
> > > > > > > > Some further queries/concerns:
> > > > > > > > 
> > > > > > > > src/hotspot/share/runtime/objectMonitor.cpp
> > > > > > > > 
> > > > > > > > Can you please explain the changes to ObjectMonitor::wait:
> > > > > > > > 
> > > > > > > > !   _recursions = save      // restore the old recursion count
> > > > > > > > !                 + jt->get_and_reset_relock_count_after_wait(); //
> > > > > > > > increased by the deferred relock count
> > > > > > > > 
> > > > > > > > what is the "deferred relock count"? I gather it relates to
> > > > > > > > 
> > > > > > > > "The code was extended to be able to deoptimize objects of a
> > > > > > > frame that
> > > > > > > > is not the top frame and to let another thread than the owning
> > > > > > > thread do
> > > > > > > > it."
> > > > > > > 
> > > > > > > Yes, these relate. Currently EA based optimizations are reverted,
> > when
> > > > > > > a compiled frame is replaced
> > > > > > > with corresponding interpreter frames. Part of this is relocking
> > > > > > > objects with eliminated
> > > > > > > locking. New with the enhancement is that we do this also just before
> > > > > > > object references are acquired
> > > > > > > through JVMTI. In this case we deoptimize also the owning compiled
> > > > > > > frame C and we register
> > > > > > > deoptimized objects as deferred updates. When control returns to C
> > it
> > > > > > > gets deoptimized, we notice
> > > > > > > that objects are already deoptimized (reallocated and relocked), so
> > we
> > > > > > > don't do it again (relocking
> > > > > > > twice would be incorrect of course). Deferred updates are copied into
> > > > > > > the new interpreter frames.
> > > > > > > 
> > > > > > > Problem: relocking is not possible if the target thread T is waiting
> > > > > > > on the monitor that needs to be
> > > > > > > relocked. This happens only with non-local objects with
> > > > > > > EliminateNestedLocks. Instead relocking is
> > > > > > > deferred until T owns the monitor again. This is what the piece of
> > > > > > > code above does.
> > > > > > 
> > > > > > Sorry I need some more detail here. How can you wait() on an object
> > > > > > monitor if the object allocation and/or locking was optimised away?
> > And
> > > > > > what is a "non-local object" in this context? Isn't EA restricted to
> > > > > > thread-confined objects?
> > > > > > 
> > > > > > Is it just that some of the locking gets optimized away e.g.
> > > > > > 
> > > > > > synchronised(obj) {
> > > > > > synchronised(obj) {
> > > > > > synchronised(obj) {
> > > > > > obj.wait();
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > > 
> > > > > > If this is reduced to a form as-if it were a single lock of the monitor
> > > > > > (due to EA) and the wait() triggers a JVM TI event which leads to the
> > > > > > escape of "obj" then we need to reconstruct the true lock state, and so
> > > > > > when the wait() internally unblocks and reacquires the monitor it has to
> > > > > > set the true recursion count to 3, not the 1 that it appeared to be when
> > > > > > wait() was initially called. Is that the scenario?
> > > > > > 
> > > > > > If so I find this truly awful. Anyone using wait() in a realistic form
> > > > > > requires a notification and so the object cannot be thread confined. In
> > > > > > which case I would strongly argue that upon hitting the wait() the
> > deopt
> > > > > > should occur unconditionally and so the lock state is correct before we
> > > > > > wait and so we don't need to mess with the recursion count internally
> > > > > > when we reacquire the monitor.
> > > > > > 
> > > > > > > 
> > > > > > > > which I don't like the sound of at all when it comes to
> > > > > > > ObjectMonitor
> > > > > > > > state. So I'd like to understand in detail exactly what is going
> > > > > > > on here
> > > > > > > > and why.  This is a very intrusive change that seems to badly
> > break
> > > > > > > > encapsulation and impacts future changes to ObjectMonitor that
> > > > > > > are under
> > > > > > > > investigation.
> > > > > > > 
> > > > > > > I would not regard this as breaking encapsulation. Certainly not badly.
> > > > > > > 
> > > > > > > I've added a property relock_count_after_wait to JavaThread. The
> > > > > > > property is well
> > > > > > > encapsulated. Future ObjectMonitor implementations have to deal
> > with
> > > > > > > recursion too. They are free in
> > > > > > > choosing a way to do that as long as that property is taken into
> > > > > > > account. This is hardly a
> > > > > > > limitation.
> > > > > > 
> > > > > > I do think this badly breaks encapsulation as you have to add a callout
> > > > > > from the guts of the ObjectMonitor code to reach into the thread to
> > get
> > > > > > this lock count adjustment. I understand why you have had to do this
> > but
> > > > > > I would much rather see a change to the EA optimisation strategy so
> > that
> > > > > > this is not needed.
> > > > > > 
> > > > > > > Note also that the property is a straight forward extension of the
> > > > > > > existing concept of deferred
> > > > > > > local updates. It is embedded into the structure holding them. So not
> > > > > > > even the footprint of a
> > > > > > > JavaThread is enlarged if no deferred updates are generated.
> > > > > > > 
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > src/hotspot/share/runtime/thread.cpp
> > > > > > > > 
> > > > > > > > Can you please explain why
> > > > > > > JavaThread::wait_for_object_deoptimization
> > > > > > > > has to be handcrafted in this way rather than using proper
> > > > > > > transitions.
> > > > > > > > 
> > > > > > > 
> > > > > > > I wrote wait_for_object_deoptimization taking
> > > > > > > JavaThread::java_suspend_self_with_safepoint_check
> > > > > > > as template. So in short: for the same reasons :)
> > > > > > > 
> > > > > > > Threads reach both methods as part of thread state transitions,
> > > > > > > therefore special handling is
> > > > > > > required to change thread state on top of ongoing transitions.
> > > > > > > 
> > > > > > > > We got rid of "deopt suspend" some time ago and it is disturbing
> > > > > > > to see
> > > > > > > > it being added back (effectively). This seems like it may be
> > > > > > > something
> > > > > > > > that handshakes could be used for.
> > > > > > > 
> > > > > > > Deopt suspend used to be something rather different with a similar
> > > > > > > name[1]. It is not being added back.
> > > > > > 
> > > > > > I stand corrected. Despite comments in the code to the contrary
> > > > > > deopt_suspend didn't actually cause a self-suspend. I was doing a lot of
> > > > > > cleanup in this area 13 years ago :)
> > > > > > 
> > > > > > > 
> > > > > > > I'm actually duplicating the existing external suspend mechanism,
> > > > > > > because a thread can be suspended
> > > > > > > at most once. And hey, and don't like that either! But it seems not
> > > > > > > unlikely that the duplicate can
> > > > > > > be removed together with the original and the new type of
> > handshakes
> > > > > > > that will be used for
> > > > > > > thread suspend can be used for object deoptimization too. See
> > today's
> > > > > > > discussion in JDK-8227745 [2].
> > > > > > 
> > > > > > I hope that discussion bears some fruit, at the moment it seems not to
> > > > > > be possible to use handshakes here. :(
> > > > > > 
> > > > > > The external suspend mechanism is a royal pain in the proverbial that
> > we
> > > > > > have to carefully live with. The idea that we're duplicating that for
> > > > > > use in another fringe area of functionality does not thrill me at all.
> > > > > > 
> > > > > > To be clear, I understand the problem that exists and that you wish to
> > > > > > solve, but for the runtime parts I balk at the complexity cost of
> > > > > > solving it.
> > > > > > 
> > > > > > Thanks,
> > > > > > David
> > > > > > -----
> > > > > > 
> > > > > > > Thanks, Richard.
> > > > > > > 
> > > > > > > [1] Deopt suspend was something like an async. handshake for
> > > > > > > architectures with register windows,
> > > > > > > where patching the return pc for deoptimization of a compiled
> > > > > > > frame was racy if the owner thread
> > > > > > > was in native code. Instead a "deopt" suspend flag was set on
> > > > > > > which the thread patched its own
> > > > > > > frame upon return from native. So no thread was suspended. It
> > got
> > > > > > > its name only from the name of
> > > > > > > the flags.
> > > > > > > 
> > > > > > > [2] Discussion about using handshakes to sync. with the target thread:
> > > > > > > 
> > > > > > > https://bugs.openjdk.java.net/browse/JDK-
> > > > 
> > 8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.syst
> > e
> > > > m.issuetabpanels:comment-tabpanel#comment-14306727
> > > > > > > 
> > > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: David Holmes <david.holmes@oracle.com>
> > > > > > > Sent: Freitag, 13. Dezember 2019 00:56
> > > > > > > To: Reingruber, Richard <richard.reingruber@sap.com>;
> > > > > > > serviceability-dev@openjdk.java.net;
> > > > > > > hotspot-compiler-dev@openjdk.java.net;
> > > > > > > hotspot-runtime-dev@openjdk.java.net
> > > > > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > > > > > > Performance in the Presence of JVMTI Agents
> > > > > > > 
> > > > > > > Hi Richard,
> > > > > > > 
> > > > > > > Some further queries/concerns:
> > > > > > > 
> > > > > > > src/hotspot/share/runtime/objectMonitor.cpp
> > > > > > > 
> > > > > > > Can you please explain the changes to ObjectMonitor::wait:
> > > > > > > 
> > > > > > > !   _recursions = save      // restore the old recursion count
> > > > > > > !                 + jt->get_and_reset_relock_count_after_wait(); //
> > > > > > > increased by the deferred relock count
> > > > > > > 
> > > > > > > what is the "deferred relock count"? I gather it relates to
> > > > > > > 
> > > > > > > "The code was extended to be able to deoptimize objects of a frame
> > that
> > > > > > > is not the top frame and to let another thread than the owning thread
> > do
> > > > > > > it."
> > > > > > > 
> > > > > > > which I don't like the sound of at all when it comes to ObjectMonitor
> > > > > > > state. So I'd like to understand in detail exactly what is going on \
> > > > > > > here and why.  This is a very intrusive change that seems to badly \
> > > > > > > break encapsulation and impacts future changes to ObjectMonitor that \
> > > > > > > are
> > under
> > > > > > > investigation.
> > > > > > > 
> > > > > > > ---
> > > > > > > 
> > > > > > > src/hotspot/share/runtime/thread.cpp
> > > > > > > 
> > > > > > > Can you please explain why
> > JavaThread::wait_for_object_deoptimization
> > > > > > > has to be handcrafted in this way rather than using proper transitions.
> > > > > > > 
> > > > > > > We got rid of "deopt suspend" some time ago and it is disturbing to
> > see
> > > > > > > it being added back (effectively). This seems like it may be something
> > > > > > > that handshakes could be used for.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > David
> > > > > > > -----
> > > > > > > 
> > > > > > > On 12/12/2019 7:02 am, David Holmes wrote:
> > > > > > > > On 12/12/2019 1:07 am, Reingruber, Richard wrote:
> > > > > > > > > Hi David,
> > > > > > > > > 
> > > > > > > > > > Most of the details here are in areas I can comment on in
> > detail,
> > > > > > > > > but I
> > > > > > > > > > did take an initial general look at things.
> > > > > > > > > 
> > > > > > > > > Thanks for taking the time!
> > > > > > > > 
> > > > > > > > Apologies the above should read:
> > > > > > > > 
> > > > > > > > "Most of the details here are in areas I *can't* comment on in detail
> > > > > > > > ..."
> > > > > > > > 
> > > > > > > > David
> > > > > > > > 
> > > > > > > > > > The only thing that jumped out at me is that I think the
> > > > > > > > > > DeoptimizeObjectsALotThread should be a hidden thread.
> > > > > > > > > > 
> > > > > > > > > > +  bool is_hidden_from_external_view() const { return true; }
> > > > > > > > > 
> > > > > > > > > Yes, it should. Will add the method like above.
> > > > > > > > > 
> > > > > > > > > > Also I don't see any testing of the
> > DeoptimizeObjectsALotThread.
> > > > > > > > > Without
> > > > > > > > > > active testing this will just bit-rot.
> > > > > > > > > 
> > > > > > > > > DeoptimizeObjectsALot is meant for stress testing with a larger
> > > > > > > > > workload. I will add a minimal test
> > > > > > > > > to keep it fresh.
> > > > > > > > > 
> > > > > > > > > > Also on the tests I don't understand your @requires clause:
> > > > > > > > > > 
> > > > > > > > > > @requires ((vm.compMode != "Xcomp") &
> > vm.compiler2.enabled
> > > > &
> > > > > > > > > > (vm.opt.TieredCompilation != true))
> > > > > > > > > > 
> > > > > > > > > > This seems to require that TieredCompilation is disabled, but
> > > > > > > > > tiered is
> > > > > > > > > > our normal mode of operation. ??
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I removed the clause. I guess I wanted to target the tests towards
> > the
> > > > > > > > > code they are supposed to
> > > > > > > > > test, and it's easier to analyze failures w/o tiered compilation \
> > > > > > > > > and with just one compiler thread.
> > > > > > > > > 
> > > > > > > > > Additionally I will make use of
> > > > > > > > > compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Richard.
> > > > > > > > > 
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: David Holmes <david.holmes@oracle.com>
> > > > > > > > > Sent: Mittwoch, 11. Dezember 2019 08:03
> > > > > > > > > To: Reingruber, Richard <richard.reingruber@sap.com>;
> > > > > > > > > serviceability-dev@openjdk.java.net;
> > > > > > > > > hotspot-compiler-dev@openjdk.java.net;
> > > > > > > > > hotspot-runtime-dev@openjdk.java.net
> > > > > > > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
> > > > > > > > > Performance in the Presence of JVMTI Agents
> > > > > > > > > 
> > > > > > > > > Hi Richard,
> > > > > > > > > 
> > > > > > > > > On 11/12/2019 7:45 am, Reingruber, Richard wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > I would like to get reviews please for
> > > > > > > > > > 
> > > > > > > > > > 
> > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
> > > > > > > > > > 
> > > > > > > > > > Corresponding RFE:
> > > > > > > > > > https://bugs.openjdk.java.net/browse/JDK-8227745
> > > > > > > > > > 
> > > > > > > > > > Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
> > > > > > > > > > And potentially https://bugs.openjdk.java.net/browse/JDK-
> > 8214584 [1]
> > > > > > > > > > 
> > > > > > > > > > Vladimir Kozlov kindly put webrev.3 through tier1-8 testing
> > without
> > > > > > > > > > issues (thanks!). In addition the
> > > > > > > > > > change is being tested at SAP since I posted the first RFR some
> > > > > > > > > > months ago.
> > > > > > > > > > 
> > > > > > > > > > The intention of this enhancement is to benefit performance wise
> > from
> > > > > > > > > > escape analysis even if JVMTI
> > > > > > > > > > agents request capabilities that allow them to access local \
> > > > > > > > > > variable values. E.g. if you start-up
> > > > > > > > > > with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,
> > then
> > > > > > > > > > escape analysis is disabled right
> > > > > > > > > > from the beginning, well before a debugger attaches -- if ever \
> > > > > > > > > > one should do so. With the
> > > > > > > > > > enhancement, escape analysis will remain enabled until and after
> > a
> > > > > > > > > > debugger attaches. EA based
> > > > > > > > > > optimizations are reverted just before an agent acquires the
> > > > > > > > > > reference to an object. In the JBS item
> > > > > > > > > > you'll find more details.
> > > > > > > > > 
> > > > > > > > > Most of the details here are in areas I can comment on in detail, \
> > > > > > > > > but
> > I
> > > > > > > > > did take an initial general look at things.
> > > > > > > > > 
> > > > > > > > > The only thing that jumped out at me is that I think the
> > > > > > > > > DeoptimizeObjectsALotThread should be a hidden thread.
> > > > > > > > > 
> > > > > > > > > +  bool is_hidden_from_external_view() const { return true; }
> > > > > > > > > 
> > > > > > > > > Also I don't see any testing of the DeoptimizeObjectsALotThread.
> > > > > > > > > Without
> > > > > > > > > active testing this will just bit-rot.
> > > > > > > > > 
> > > > > > > > > Also on the tests I don't understand your @requires clause:
> > > > > > > > > 
> > > > > > > > > @requires ((vm.compMode != "Xcomp") &
> > vm.compiler2.enabled &
> > > > > > > > > (vm.opt.TieredCompilation != true))
> > > > > > > > > 
> > > > > > > > > This seems to require that TieredCompilation is disabled, but \
> > > > > > > > > tiered
> > is
> > > > > > > > > our normal mode of operation. ??
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > David
> > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > Richard.
> > > > > > > > > > 
> > > > > > > > > > [1] Experimental fix for JDK-8214584 based on JDK-8227745
> > > > > > > > > > 
> > > > 
> > http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.pa
> > tc
> > > > h
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic