'Re: RFR: 8280126: C2: detect and remove dead irreducible loops [v9]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-compiler-dev
Subject:    Re: RFR: 8280126: C2: detect and remove dead irreducible loops [v9]
From:       Emanuel Peter <epeter () openjdk ! org>
Date:       2023-01-31 9:33:45
Message-ID: hc_L5YqZ0mateIZInm--BAtEMY3lqUqeMXzNPVBCNzQ=.d8119fcb-4afd-412d-bc64-840471ecf622 () github ! com
[Download RAW message or body]

> **Context**
> If a `LoopNode` loses entry control, we remove it, to prevent having a dead-loop \
> (backedge would be only input to LoopNode): \
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/cfgnode.cpp#L541-L544
>  
> We must remove such dead code, otherwise all sorts of bad graph patterns can be \
> created, including self-referring Add nodes etc, and that would either hit asserts, \
> or crash the VM. 
> Also `PhiNode` does some checks to avoid creating a dead-loop:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/cfgnode.cpp#L2004-L2019
>  
> However, all of this logic assumes that we have properly canonicalized reducible \
> loops: every loop-head must be a `LoopNode`, where we have a loop-entry-control, \
> and a backedge-control. Once the loop-entry-control dies, we know the loop is \
> dead-code. 
> **Problem**
> 
> This dead-loop removal logic does not work for irreducible loops. I found many JASM \
> examples, and even was able to produce a Java reproducer. I have seen these asserts \
> triggered: 
> Self-referencing data node:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/phaseX.cpp#L943
>  
> We find dead-code CFG nodes:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/loopnode.cpp#L5293
>  
> Dead loop of phi's without any input:
> https://github.com/openjdk/jdk/blob/8c472e481676ed0ef475c4989477d5714880c59e/src/hotspot/share/opto/cfgnode.cpp#L2539
>  
> We must remove the loop once it loses its last entry-control. The problem is that a \
> irreducible loop has multiple entries, by definition. Irreducible loops have no \
> `LoopNodes`, they are simply `RegionNodes` with multiple inputs. If one of the \
> controls is lost, it is a priori not clear if this was the last entry-control for \
> the whole irreducible loop. 
> **Solution Summary**
> 
> We mark every `RegionNode` with one of these three:
> 
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.hpp#L103-L112
>  
> We remove irreducible loops like this:
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L591-L603
>  
> And during `PhaseIdealLoop::build_loop_tree` we verify that all irreducible \
> loop-entries are marked as `MaybeIrreducibleEntry`: \
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/loopnode.cpp#L5161-L5162
>  
> Additionally, we can verify that no irreducible loop contains regions marked as \
> `Reducible`: https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/loopnode.cpp#L5277-L5279
>  
> **Solution Details**
> 
> During parsing, we call `ciTypeFlow::Block::copy_irreducible_status_to(RegionNode* \
> region, const JVMState* jvms)` for every `RegionNode` created for block-merges. \
> This checks `ciTypeFlow::Block::is_in_irreducible_loop()`, to see if the relevant \
> block of this region is in an irreducible loop (see "Alternative Solutions" below, \
> why I mark all regions inside irreducible loops with `MaybeIrreducibleEntry`). For \
> this to work, I had to slightly improve the irreducible loop detection in \
> `ciTypeFlow`. One could improve the marking after every \
> `PhaseIdealLoop::build_loop_tree`, but that comes at the cost of more compile-time, \
> so I did not implement it. 
> In some cases, new regions are created that need to be marked with \
> `MaybeIrreducibleEntry`. For example `IdealLoopTree::split_fall_in` can split a \
> irreducible loop head, after which both are irreducible loop entries: \
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/loopnode.cpp#L3143-L3144
>  
> Regions for which we do not explicitly set `MaybeIrreducibleEntry` are marked as \
> `NeverIrreducibleEntry`. We do this for any new regions that are added, for example \
> by the `GraphKit`. They all seem to be safe, as those regions can never become \
> irreducible loop entries. 
> I tried to mark as many regions as possible with `Reducible`, so that we can do \
> stronger asserts. So no enclosing loop of a region is irreducible, we mark it \
> `Reducible`, and assert that it will never be inside a irreducible loop. However, \
> checking for an outher irreducible loop turns out to be tricky when we have \
> inlining: regions in the inner method need to check if they are in a irreducible \
> loop of the outer method. I tried to implement this, but found it to be too \
> difficult (We would need to find the block in the outer method where the inlining \
> of the inner method happens. An additional complication is that the method only \
> stores the non-OSR ciTypeFlow, even if the outer method is OSR compiled - thus the \
> irreducible status can be inaccurate). So I just mark the nodes as \
> `NeverIrreducibleEntry` if they are not in an irreducible loop in the current loop, \
> and there is an outer loop. This is safe (they can never be irreducible entries): \
> the region would have to merge a "backedge" a
 nd an "entry", both separately entering the inlined method, but there is only a \
single entry point to the inlined method.
> 
> Also `PhiNodes` have to be handled more carefully, hence I block phi's in \
> irreducible loops from acting on `TOP` inputs until the `Region` has a chance to \
> react: https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L2048-L2054
>  
> When we detect a subgraph is a dead-loop, we remove it with
> https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L797
>  I had to refactor it a bit to be more agressive (first gather all dead nodes, only \
> then remove them), which lead to the discovery of a few optimizations that could \
> not deal with TOP inputs. 
> It has been known that `split_flow_path` can cause new irreducible loops to appear \
> [JDK-6742111](https://bugs.openjdk.org/browse/JDK-6742111). With my marking of \
> regions and the verification of it, we cannot allow new irreducible loops to \
> appear. However, from the testing and fuzzing I have performed, it seems that this \
> can only happen when there is already another irreducible loop in the graph. Thus \
> it seems sufficient to disable the optimization if there are already irreducible \
> loops present: https://github.com/openjdk/jdk/blob/e2cc4229dd6e696847ebfecb19ab5d4b8621e31d/src/hotspot/share/opto/cfgnode.cpp#L1841-L1853
>  
> **Alternative Solutions**
> 
> It is the long-term goal to remove irreducible loops from the graph, either by \
> node-splitting or the dispatcher approach. So for now, we want a fix that works and \
> is not too complex. If this fix turns out to be too slow, especially because of \
> additional readability traversals, then we may need to revisit alternatives. 
> A first and most brute-force approach would have been to simply do a reachability \
> check for all regions, once an input control is lost. Or only do it if there is an \
> irreducible loop anywhere in the graph. But that would clearly lead to some \
> slowdown for OSR compilation, as they often have some irreducible loop in the \
> graph. It is thus better to limit the number of nodes that need to check \
> reachability. 
> _Why did I not exclusively mark regions that are irreducible loop entries?_
> An entry of an irreducible loop can lose all internal edges ("backedges"), \
> collapse, and float outside the loop. The entry is now further down the CFG from \
> the old entry, possibly through a series of if/region. One could attempt to move \
> the marking to the new entries, but that would be a complex task. An example can be \
> found in regression test `test_009`. 
> _Can we identify the smallest set of nodes that would ever be irreducible entries?_
> This is tricky. `build_loop_tree` finds loop-heads, which would certainly all have \
> to be marked. However, finding all secondary entries is something one would have to \
> do. Currently, when we find a second entry, we stop there, but to determine all \
> secondary entries one would probably have to traverse further into the loop again. \
> One might be able to mark less nodes, in some cases where we have reducible loops \
> inside irreducible loops, where the loop-head of the (inner) reducible loop is not \
> an entry of the (outer) irreducible loop. It is not yet clear to me how to do that, \
> and if it really speeds things up enough to justify the added complexity. 
> **Testing**
> 
> This bug was reported with a modified classfile, as far as we know the bytecode \
> must have been modified/fuzzed. I then wrote my own bytecode fuzzer that produces \
> JASM code with irreducible loops, and quickly found all sorts of similar failures. \
> I am already working on porting this fuzzer to Java \
> [JDK-8299214](https://bugs.openjdk.org/browse/JDK-8299214), and hopefully integrate \
> it into testing. It did not just find issues with irreducible loops, but also with \
> infinite loops. 
> I added many tests, some reduced down from that fuzzer, some hand-crafted.
> 
> So far, this change passes stress-testing and tier1-tier7. Rerunning it again \
> now... Performance testing shows no significant runtime change.
> 
> **Update**: I decided that we can always run `verify_regions_in_irreducible_loops`. \
> The overhead is very minimal. In most cases we do not even have irreducible loops, \
> and so there is no overhead. In the appended regression test where there are a lot \
> of irreducible loops, we have less than one percent overhead on the compilation \
> time.

Emanuel Peter has updated the pull request incrementally with one additional commit \
since the last revision:

  Updated copyright to 2023

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/11764/files
  - new: https://git.openjdk.org/jdk/pull/11764/files/3cea2e5b..d7d9c21b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11764&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11764&range=07-08

  Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/11764.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11764/head:pull/11764

PR: https://git.openjdk.org/jdk/pull/11764


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic