[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-hotspot-compiler-dev
Subject: Re: JIT code generation for Long/Integer.compare
From: Vitaly Davidovich <vitalyd () gmail ! com>
Date: 2015-09-28 22:31:09
Message-ID: CAHjP37Hv=orgkw7y7VzmrPwDRatyqj6jcrU1o+9-N6wfci+rnA () mail ! gmail ! com
[Download RAW message or body]
Speaking of redundant cmps, I've seen them as well in various cases. One
different example that I brought up on this list is for switches, which
John Rose explained here:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015524.html
I don't know if the switch case is ultimately the same issue or not, but
there is definitely a tendency to see redundant back-to-back cmp with only
the jump predicate varying.
sent from my phone
On Sep 28, 2015 6:12 PM, "Aleksey Shipilev" <aleksey.shipilev@oracle.com>
wrote:
> Hi Ian,
>
> I filed:
> https://bugs.openjdk.java.net/browse/JDK-8137309
>
> I can see some improvement with your patch, mostly when Longs are
> random, and therefore branch profile is polluted, and current generated
> code is not laid out well. Current Hotspot seems to generate multiple
> redundant cmp-s, which looks like a generic issue that should be
> addressed. See e.g. the very first disassembly here:
> http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java
>
> In some other cases, patched version is regressing. It seems to be the
> interaction with intrinsic not using any branch profile to figure out
> which option is more frequent, and some minor regalloc/constant
> propagation issues. See e.g.:
> http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java
>
> With that, fixing the optimizer/codegen without resorting to point
> intrinsics does look like a better course of action.
>
> Thanks,
> -Aleksey
>
> On 09/28/2015 04:44 AM, Ian Rogers wrote:
> > Thanks. I'm not disagreeing with anything here. is_x2logic is 45 lines
> > long and doing less (at least half) pattern matching work than would be
> > required for compare. Wrt implementation approach, ReverseBytesI is
> > purely an intrinsic but could also be implemented by pattern matching,
> > which fwiw is how its implemented in Jikes RVM as part of instruction
> > selection. With hindsight a high/ideal level BURS matching phase would
> > be nice. The intrinsic implementation is trivially correct and easy for
> > developers to target, whereas pattern matching can be buggy, hard to
> > test, and so on. Its inherently more LOC complex. I have long thought
> > pattern matching is the more laudable approach, but there's also a
> > certain amount of pragmatism when selecting between an intrinsic and
> > pattern matching. The scale of the proposed pattern match is beyond the
> > size of any equivalent in C2, so I have concerns around correctness and
> > in the consistency of the codebase in implementing it this way.
> >
> > If I may explain some of the background for this change, aside pure
> > performance. There is a common bug pattern of:
> >
> > class MyFoo implements Comparable<MyFoo> {
> > private long x;
> > int compareTo(MyFoo o) {
> > return (int)(o.x - x);
> > }
> > }
> >
> > As the int cast doesn't preserve the sign of the long subtract, subtract
> > results that overflow 31-bits may produce incorrect sort orders.
> > Rewriting the above code to use Long.compare is currently a performance
> > loss, although far more correct. What's true of long is also somewhat
> > true for ints as the subtract approach doesn't behave correctly around
> > Integer.MIN_VALUE. Having Long.compare and Integer.compare flagged as
> > intrinsics would go someway to remove developers reticence to prefer it
> > over the pervasive subtract approach which is naively considered more
> > performant.
> >
> > Thanks,
> > Ian
> >
> >
> > On Sat, Sep 26, 2015 at 12:02 PM, John Rose <john.r.rose@oracle.com
> > <mailto:john.r.rose@oracle.com>> wrote:
> >
> > On Sep 25, 2015, at 7:35 PM, Vladimir Kozlov
> > <vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>>
> wrote:
> >>
> >> For pattern matching I mean ideal transformation in ideal graph.
> >> See, fro example, is_x2logic() in cfgnode.cpp and other
> >> transformations for Phi node.
> >>
> >> You will still need new CmpI3 node and changes in .ad file.
> >
> > +1 Ideally, we pattern-match the body of the method we know about
> and
> > generate the new IR, and then two good things happen: Other
> expressions
> > we didn't know about also pattern match, and after looking at the
> > pattern
> > logic we discover straightforward generalizations that go beyond the
> > method
> > we knew about at first.
> >
> > — John
> >
> >
>
>
>
[Attachment #3 (text/html)]
<p dir="ltr">Speaking of redundant cmps, I've seen them as well in various cases. \
One different example that I brought up on this list is for switches, which John Rose \
explained here: <a href="http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2 \
014-September/015524.html">http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015524.html</a></p>
<p dir="ltr">I don't know if the switch case is ultimately the same issue or \
not, but there is definitely a tendency to see redundant back-to-back cmp with only \
the jump predicate varying.</p> <p dir="ltr">sent from my phone</p>
<div class="gmail_quote">On Sep 28, 2015 6:12 PM, "Aleksey Shipilev" <<a \
href="mailto:aleksey.shipilev@oracle.com">aleksey.shipilev@oracle.com</a>> \
wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ian,<br> <br>
I filed:<br>
<a href="https://bugs.openjdk.java.net/browse/JDK-8137309" rel="noreferrer" \
target="_blank">https://bugs.openjdk.java.net/browse/JDK-8137309</a><br> <br>
I can see some improvement with your patch, mostly when Longs are<br>
random, and therefore branch profile is polluted, and current generated<br>
code is not laid out well. Current Hotspot seems to generate multiple<br>
redundant cmp-s, which looks like a generic issue that should be<br>
addressed. See e.g. the very first disassembly here:<br>
<a href="http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java</a><br>
<br>
In some other cases, patched version is regressing. It seems to be the<br>
interaction with intrinsic not using any branch profile to figure out<br>
which option is more frequent, and some minor regalloc/constant<br>
propagation issues. See e.g.:<br>
<a href="http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java" \
rel="noreferrer" target="_blank">http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java</a><br>
<br>
With that, fixing the optimizer/codegen without resorting to point<br>
intrinsics does look like a better course of action.<br>
<br>
Thanks,<br>
-Aleksey<br>
<br>
On 09/28/2015 04:44 AM, Ian Rogers wrote:<br>
> Thanks. I'm not disagreeing with anything here. is_x2logic is 45 lines<br>
> long and doing less (at least half) pattern matching work than would be<br>
> required for compare. Wrt implementation approach, ReverseBytesI is<br>
> purely an intrinsic but could also be implemented by pattern matching,<br>
> which fwiw is how its implemented in Jikes RVM as part of instruction<br>
> selection. With hindsight a high/ideal level BURS matching phase would<br>
> be nice. The intrinsic implementation is trivially correct and easy for<br>
> developers to target, whereas pattern matching can be buggy, hard to<br>
> test, and so on. Its inherently more LOC complex. I have long thought<br>
> pattern matching is the more laudable approach, but there's also a<br>
> certain amount of pragmatism when selecting between an intrinsic and<br>
> pattern matching. The scale of the proposed pattern match is beyond the<br>
> size of any equivalent in C2, so I have concerns around correctness and<br>
> in the consistency of the codebase in implementing it this way.<br>
><br>
> If I may explain some of the background for this change, aside pure<br>
> performance. There is a common bug pattern of:<br>
><br>
> class MyFoo implements Comparable<MyFoo> {<br>
> private long x;<br>
> int compareTo(MyFoo o) {<br>
> return (int)(o.x - x);<br>
> }<br>
> }<br>
><br>
> As the int cast doesn't preserve the sign of the long subtract, subtract<br>
> results that overflow 31-bits may produce incorrect sort orders.<br>
> Rewriting the above code to use Long.compare is currently a performance<br>
> loss, although far more correct. What's true of long is also somewhat<br>
> true for ints as the subtract approach doesn't behave correctly around<br>
> Integer.MIN_VALUE. Having Long.compare and Integer.compare flagged as<br>
> intrinsics would go someway to remove developers reticence to prefer it<br>
> over the pervasive subtract approach which is naively considered more<br>
> performant.<br>
><br>
> Thanks,<br>
> Ian<br>
><br>
><br>
> On Sat, Sep 26, 2015 at 12:02 PM, John Rose <<a \
href="mailto:john.r.rose@oracle.com">john.r.rose@oracle.com</a><br> > \
<mailto:<a href="mailto:john.r.rose@oracle.com">john.r.rose@oracle.com</a>>> \
wrote:<br> ><br>
> On Sep 25, 2015, at 7:35 PM, Vladimir Kozlov<br>
> <<a href="mailto:vladimir.kozlov@oracle.com">vladimir.kozlov@oracle.com</a> \
<mailto:<a href="mailto:vladimir.kozlov@oracle.com">vladimir.kozlov@oracle.com</a>>> \
wrote:<br> >><br>
>> For pattern matching I mean ideal transformation in ideal graph.<br>
>> See, fro example, is_x2logic() in cfgnode.cpp and other<br>
>> transformations for Phi node.<br>
>><br>
>> You will still need new CmpI3 node and changes in .ad file.<br>
><br>
> +1 Ideally, we pattern-match the body of the method we know about \
and<br> > generate the new IR, and then two good things happen: Other \
expressions<br> > we didn't know about also pattern match, and after \
looking at the<br> > pattern<br>
> logic we discover straightforward generalizations that go beyond the<br>
> method<br>
> we knew about at first.<br>
><br>
> — John<br>
><br>
><br>
<br>
<br>
</blockquote></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic