'Re: JIT code generation for Long/Integer.compare'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-compiler-dev
Subject:    Re: JIT code generation for Long/Integer.compare
From:       Vitaly Davidovich <vitalyd () gmail ! com>
Date:       2015-09-28 22:31:09
Message-ID: CAHjP37Hv=orgkw7y7VzmrPwDRatyqj6jcrU1o+9-N6wfci+rnA () mail ! gmail ! com
[Download RAW message or body]

Speaking of redundant cmps, I've seen them as well in various cases.  One
different example that I brought up on this list is for switches, which
John Rose explained here:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015524.html

I don't know if the switch case is ultimately the same issue or not, but
there is definitely a tendency to see redundant back-to-back cmp with only
the jump predicate varying.

sent from my phone
On Sep 28, 2015 6:12 PM, "Aleksey Shipilev" <aleksey.shipilev@oracle.com>
wrote:

> Hi Ian,
>
> I filed:
>   https://bugs.openjdk.java.net/browse/JDK-8137309
>
> I can see some improvement with your patch, mostly when Longs are
> random, and therefore branch profile is polluted, and current generated
> code is not laid out well. Current Hotspot seems to generate multiple
> redundant cmp-s, which looks like a generic issue that should be
> addressed. See e.g. the very first disassembly here:
>   http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java
>
> In some other cases, patched version is regressing. It seems to be the
> interaction with intrinsic not using any branch profile to figure out
> which option is more frequent, and some minor regalloc/constant
> propagation issues. See e.g.:
>   http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java
>
> With that, fixing the optimizer/codegen without resorting to point
> intrinsics does look like a better course of action.
>
> Thanks,
> -Aleksey
>
> On 09/28/2015 04:44 AM, Ian Rogers wrote:
> > Thanks. I'm not disagreeing with anything here. is_x2logic is 45 lines
> > long and doing less (at least half) pattern matching work than would be
> > required for compare. Wrt implementation approach, ReverseBytesI is
> > purely an intrinsic but could also be implemented by pattern matching,
> > which fwiw is how its implemented in Jikes RVM as part of instruction
> > selection. With hindsight a high/ideal level BURS matching phase would
> > be nice. The intrinsic implementation is trivially correct and easy for
> > developers to target, whereas pattern matching can be buggy, hard to
> > test, and so on. Its inherently more LOC complex. I have long thought
> > pattern matching is the more laudable approach, but there's also a
> > certain amount of pragmatism when selecting between an intrinsic and
> > pattern matching. The scale of the proposed pattern match is beyond the
> > size of any equivalent in C2, so I have concerns around correctness and
> > in the consistency of the codebase in implementing it this way.
> >
> > If I may explain some of the background for this change, aside pure
> > performance. There is a common bug pattern of:
> >
> > class MyFoo implements Comparable<MyFoo> {
> >   private long x;
> >   int compareTo(MyFoo o) {
> >     return (int)(o.x - x);
> >   }
> > }
> >
> > As the int cast doesn't preserve the sign of the long subtract, subtract
> > results that overflow 31-bits may produce incorrect sort orders.
> > Rewriting the above code to use Long.compare is currently a performance
> > loss, although far more correct. What's true of long is also somewhat
> > true for ints as the subtract approach doesn't behave correctly around
> > Integer.MIN_VALUE. Having Long.compare and Integer.compare flagged as
> > intrinsics would go someway to remove developers reticence to prefer it
> > over the pervasive subtract approach which is naively considered more
> > performant.
> >
> > Thanks,
> > Ian
> >
> >
> > On Sat, Sep 26, 2015 at 12:02 PM, John Rose <john.r.rose@oracle.com
> > <mailto:john.r.rose@oracle.com>> wrote:
> >
> >     On Sep 25, 2015, at 7:35 PM, Vladimir Kozlov
> >     <vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>>
> wrote:
> >>
> >>     For pattern matching I mean ideal transformation in ideal graph.
> >>     See, fro example, is_x2logic() in cfgnode.cpp and other
> >>     transformations for Phi node.
> >>
> >>     You will still need new CmpI3 node and changes in .ad file.
> >
> >     +1  Ideally, we pattern-match the body of the method we know about
> and
> >     generate the new IR, and then two good things happen:  Other
> expressions
> >     we didn't know about also pattern match, and after looking at the
> >     pattern
> >     logic we discover straightforward generalizations that go beyond the
> >     method
> >     we knew about at first.
> >
> >     — John
> >
> >
>
>
>

[Attachment #3 (text/html)]

<p dir="ltr">Speaking of redundant cmps, I&#39;ve seen them as well in various cases. \
One different example that I brought up on this list is for switches, which John Rose \
explained here: <a href="http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2 \
014-September/015524.html">http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015524.html</a></p>
 <p dir="ltr">I don&#39;t know if the switch case is ultimately the same issue or \
not, but there is definitely a tendency to see redundant back-to-back cmp with only \
the jump predicate varying.</p> <p dir="ltr">sent from my phone</p>
<div class="gmail_quote">On Sep 28, 2015 6:12 PM, &quot;Aleksey Shipilev&quot; &lt;<a \
href="mailto:aleksey.shipilev@oracle.com">aleksey.shipilev@oracle.com</a>&gt; \
wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ian,<br> <br>
I filed:<br>
   <a href="https://bugs.openjdk.java.net/browse/JDK-8137309" rel="noreferrer" \
target="_blank">https://bugs.openjdk.java.net/browse/JDK-8137309</a><br> <br>
I can see some improvement with your patch, mostly when Longs are<br>
random, and therefore branch profile is polluted, and current generated<br>
code is not laid out well. Current Hotspot seems to generate multiple<br>
redundant cmp-s, which looks like a generic issue that should be<br>
addressed. See e.g. the very first disassembly here:<br>
   <a href="http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java</a><br>
 <br>
In some other cases, patched version is regressing. It seems to be the<br>
interaction with intrinsic not using any branch profile to figure out<br>
which option is more frequent, and some minor regalloc/constant<br>
propagation issues. See e.g.:<br>
   <a href="http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java</a><br>
 <br>
With that, fixing the optimizer/codegen without resorting to point<br>
intrinsics does look like a better course of action.<br>
<br>
Thanks,<br>
-Aleksey<br>
<br>
On 09/28/2015 04:44 AM, Ian Rogers wrote:<br>
&gt; Thanks. I&#39;m not disagreeing with anything here. is_x2logic is 45 lines<br>
&gt; long and doing less (at least half) pattern matching work than would be<br>
&gt; required for compare. Wrt implementation approach, ReverseBytesI is<br>
&gt; purely an intrinsic but could also be implemented by pattern matching,<br>
&gt; which fwiw is how its implemented in Jikes RVM as part of instruction<br>
&gt; selection. With hindsight a high/ideal level BURS matching phase would<br>
&gt; be nice. The intrinsic implementation is trivially correct and easy for<br>
&gt; developers to target, whereas pattern matching can be buggy, hard to<br>
&gt; test, and so on. Its inherently more LOC complex. I have long thought<br>
&gt; pattern matching is the more laudable approach, but there&#39;s also a<br>
&gt; certain amount of pragmatism when selecting between an intrinsic and<br>
&gt; pattern matching. The scale of the proposed pattern match is beyond the<br>
&gt; size of any equivalent in C2, so I have concerns around correctness and<br>
&gt; in the consistency of the codebase in implementing it this way.<br>
&gt;<br>
&gt; If I may explain some of the background for this change, aside pure<br>
&gt; performance. There is a common bug pattern of:<br>
&gt;<br>
&gt; class MyFoo implements Comparable&lt;MyFoo&gt; {<br>
&gt;     private long x;<br>
&gt;     int compareTo(MyFoo o) {<br>
&gt;        return (int)(o.x - x);<br>
&gt;     }<br>
&gt; }<br>
&gt;<br>
&gt; As the int cast doesn&#39;t preserve the sign of the long subtract, subtract<br>
&gt; results that overflow 31-bits may produce incorrect sort orders.<br>
&gt; Rewriting the above code to use Long.compare is currently a performance<br>
&gt; loss, although far more correct. What&#39;s true of long is also somewhat<br>
&gt; true for ints as the subtract approach doesn&#39;t behave correctly around<br>
&gt; Integer.MIN_VALUE. Having Long.compare and Integer.compare flagged as<br>
&gt; intrinsics would go someway to remove developers reticence to prefer it<br>
&gt; over the pervasive subtract approach which is naively considered more<br>
&gt; performant.<br>
&gt;<br>
&gt; Thanks,<br>
&gt; Ian<br>
&gt;<br>
&gt;<br>
&gt; On Sat, Sep 26, 2015 at 12:02 PM, John Rose &lt;<a \
href="mailto:john.r.rose@oracle.com">john.r.rose@oracle.com</a><br> &gt; \
&lt;mailto:<a href="mailto:john.r.rose@oracle.com">john.r.rose@oracle.com</a>&gt;&gt; \
wrote:<br> &gt;<br>
&gt;        On Sep 25, 2015, at 7:35 PM, Vladimir Kozlov<br>
&gt;        &lt;<a href="mailto:vladimir.kozlov@oracle.com">vladimir.kozlov@oracle.com</a> \
&lt;mailto:<a href="mailto:vladimir.kozlov@oracle.com">vladimir.kozlov@oracle.com</a>&gt;&gt; \
wrote:<br> &gt;&gt;<br>
&gt;&gt;        For pattern matching I mean ideal transformation in ideal graph.<br>
&gt;&gt;        See, fro example, is_x2logic() in cfgnode.cpp and other<br>
&gt;&gt;        transformations for Phi node.<br>
&gt;&gt;<br>
&gt;&gt;        You will still need new CmpI3 node and changes in .ad file.<br>
&gt;<br>
&gt;        +1   Ideally, we pattern-match the body of the method we know about \
and<br> &gt;        generate the new IR, and then two good things happen:   Other \
expressions<br> &gt;        we didn&#39;t know about also pattern match, and after \
looking at the<br> &gt;        pattern<br>
&gt;        logic we discover straightforward generalizations that go beyond the<br>
&gt;        method<br>
&gt;        we knew about at first.<br>
&gt;<br>
&gt;        — John<br>
&gt;<br>
&gt;<br>
<br>
<br>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic