[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-ppc-aix-port-dev
Subject:    RE: Removing redundant addis's
From:       "Lindenmaier, Goetz" <goetz.lindenmaier () sap ! com>
Date:       2016-10-10 21:30:44
Message-ID: 2244a22ef2aa4f05bc2869b4e59854a9 () DEWDFE13DE50 ! global ! corp ! sap
[Download RAW message or body]

Hi Bruno,

The code we contributed is derived from our internal VM based on licensed code from Oracle.
In this VM, before postalloc_expand, we count the size of the code/constants sections and
store it in C->cfg()->_consts_size.
There are two TOC variants. Depending on the size, we replace the big one
with the small one. This happens in postalloc_expand_load_long_constant.
Relocations must be able to exchange the offset in the instructions, therefor
it must be a precise pattern. Therefor it breaks when you change code.
When omitting the instruction, you probably just get the small variant.

For what's happening in more detail:
The matcher matches a ConLNode to a loadConL_ExNode. 
The register allocation might rematerialize the loadConL_Ex, so there are several ones.
After register allocation, we run postalloc_expand, for this node this is implemented 
in loadConLNodesTuple_create().  This function has " large_constant_pool = true", 
so it chooses the bigger variant, and replaces the loadConL_Ex by two instructions.
Here C->cfg()->_consts_size is missing!! Set large_constant_pool = false for an experiment.
We split the instructions so we can do scheduling for Power6. For Power6
scheduling we also needed to know precise instruction addresses etc.

Wrt. less importance of Power6 nowadays, this could be changed, though.
Try to implement an emit statement in loadConL_Ex instead of 
the late expand.  You must be careful though, big offsets arise only
seldomly, e.g., in generated code.  

Best regards,
  Goetz


> -----Original Message-----
> From: Bruno Alexandre Rosa [mailto:bruno.rosa@eldorado.org.br]
> Sent: Monday, October 10, 2016 11:04 PM
> To: Lindenmaier, Goetz <goetz.lindenmaier@sap.com>; ppc-aix-port-
> dev@openjdk.java.net
> Subject: RE: Removing redundant addis's
> 
> Hello, Goetz,
> 
> Thanks for the quick answer.
> 
> The places where addis is emitted are mostly related to the TOC.
> I tried calling mr_if_needed() instead or issuing a nop, but that seems to
> break
> later checks that rely on knowing the exact instructions of a TOC load.
> 
> However, I also tried forcefully NOT emiting the redundant addis and it
> worked
> fine: I was able to inspect the jitted code with both -XX:+PrintOptoAssembly
> and using perf inject. (I ran a pi digits microbenchmark for 10k digits).
> 
> I'm still getting familiar with the code and would welcome any help to
> understand why the latter approached worked and why the former failed.
> 
> Here is the patch with the attempt that worked:
> 
> diff -r b2b2ec149a24 src/cpu/ppc/vm/ppc.ad
> --- a/src/cpu/ppc/vm/ppc.ad     Mon Oct 03 19:09:26 2016 +0000
> +++ b/src/cpu/ppc/vm/ppc.ad     Mon Oct 10 19:32:12 2016 -0300
> @@ -2468,7 +2468,10 @@
>        ((loadConL_hiNode*)this)->_cbuf_insts_offset = __ offset();
>      }
> 
> -    __ addis($dst$$Register, $toc$$Register,
> MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset));
> +    int offset =
> MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset);
> +    if ($dst$$Register != $toc$$Register || offset != 0) {
> +      __ addis($dst$$Register, $toc$$Register, offset);
> +    }
>    %}
> 
>  %} // encode
> @@ -2637,7 +2640,10 @@
>        ((loadConP_hiNode*)this)->_const_toc_offset = toc_offset;
>      }
> 
> -    __ addis($dst$$Register, $toc$$Register,
> MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset));
> +    int offset =
> MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset);
> +    if ($dst$$Register != $toc$$Register || offset != 0) {
> +      __ addis($dst$$Register, $toc$$Register, offset);
> +    }
>    %}
> 
>    // Postalloc expand emitter for loading a ptr constant from the method's
> TOC.
> @@ -6117,9 +6123,13 @@
>      int hi = (offset + (1<<15))>>16;
>      int lo = offset - hi * (1<<16);
> 
> -    __ addis(Rtoc, Rtoc, hi);
> +    if (hi != 0) {
> +      __ addis(Rtoc, Rtoc, hi);
> +    }
>      __ lfs(Rdst, lo, Rtoc);
> -    __ addis(Rtoc, Rtoc, -hi);
> +    if (hi != 0) {
> +      __ addis(Rtoc, Rtoc, -hi);
> +    }
>    %}
>    ins_pipe(pipe_class_memory);
>  %}
> @@ -6182,9 +6192,13 @@
>      int hi = (offset + (1<<15))>>16;
>      int lo = offset - hi * (1<<16);
> 
> -    __ addis(Rtoc, Rtoc, hi);
> +    if (hi != 0) {
> +      __ addis(Rtoc, Rtoc, hi);
> +    }
>      __ lfd(Rdst, lo, Rtoc);
> -    __ addis(Rtoc, Rtoc, -hi);
> +    if (hi != 0) {
> +      __ addis(Rtoc, Rtoc, -hi);
> +    }
>    %}
>    ins_pipe(pipe_class_memory);
>  %}
> @@ -8420,7 +8434,10 @@
>    size(4);
>    ins_encode %{
>      // TODO: PPC port $archOpcode(ppc64Opcode_addis);
> -    __ addis($dst$$Register, $src1$$Register, ($src2$$constant)>>16);
> +    int offset = ($src2$$constant)>>16;
> +    if ($dst$$Register != $src1$$Register || offset != 0) {
> +      __ addis($dst$$Register, $src1$$Register, offset);
> +    }
>    %}
>    ins_pipe(pipe_class_default);
>  %}
> @@ -8499,7 +8516,10 @@
>    size(4);
>    ins_encode %{
>      // TODO: PPC port $archOpcode(ppc64Opcode_addis);
> -    __ addis($dst$$Register, $src1$$Register, ($src2$$constant)>>16);
> +    int offset = ($src2$$constant)>>16;
> +    if ($dst$$Register != $src1$$Register || offset != 0) {
> +      __ addis($dst$$Register, $src1$$Register, offset);
> +    }
>    %}
>    ins_pipe(pipe_class_default);
>  %}
> @@ -8539,7 +8559,10 @@
>    size(4);
>    ins_encode %{
>      // TODO: PPC port $archOpcode(ppc64Opcode_addis);
> -    __ addis($dst$$Register, $src1$$Register, ($src2$$constant)>>16);
> +    int offset = ($src2$$constant)>>16;
> +    if ($dst$$Register != $src1$$Register || offset != 0) {
> +      __ addis($dst$$Register, $src1$$Register, offset);
> +    }
>    %}
>    ins_pipe(pipe_class_default);
>  %}
> 
>  Regards,
>  Bruno Rosa
> -----Original Message-----
> From: Lindenmaier, Goetz [mailto:goetz.lindenmaier@sap.com]
> Sent: segunda-feira, 10 de outubro de 2016 06:08
> To: Bruno Alexandre Rosa <bruno.rosa@eldorado.org.br>; ppc-aix-port-
> dev@openjdk.java.net
> Subject: RE: Removing redundant addis's
> 
> Hi Bruno,
> 
> the emitters in the assembler_<platform>.* files are meant to be pure
> emitters without optimizations.  Callers can rely that the instruction is
> emitted, which might be important for effects like code alignment or
> instruction scheduling.
> So you should not change them.
> 
> If this is not an accidential occurance you should identify the place where this
> is emitted and call mr_if_needed() instead, or issue a nop.
> 
> Best regards,
>   Goetz.
> 
> 
> > -----Original Message-----
> > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> > bounces@openjdk.java.net] On Behalf Of Bruno Alexandre Rosa
> > Sent: Freitag, 7. Oktober 2016 22:53
> > To: ppc-aix-port-dev@openjdk.java.net
> > Subject: Removing redundant addis's
> >
> > Hello, everyone,
> >
> >
> >
> > I've been taking a look at some performance optmization opportunities
> > in the
> >
> > ppc64-specific HotSpot code and stumbled upon some oddities in the
> > jitted code
> >
> > like:
> >
> >
> >
> > addis r20, r20, 0
> >
> >
> >
> > I went ahead and modified the Assembler not to emit an addis when the
> > operands
> >
> > are the same and the offset is 0.
> >
> >
> >
> > I tried adding similar conditions on every "__ addis" call on ppc.ad,
> > but
> >
> > thought this kind of solution is inelegant and harder to maintain.
> > Then I went
> >
> > for a simpler solution.
> >
> >
> >
> > Here is the diff:
> >
> >
> >
> > diff -r b89f80ea707c src/cpu/ppc/vm/assembler_ppc.inline.hpp
> >
> > --- a/src/cpu/ppc/vm/assembler_ppc.inline.hpp   Fri Oct 07 16:58:57 2016 -
> > 0300
> >
> > +++ b/src/cpu/ppc/vm/assembler_ppc.inline.hpp   Fri Oct 07 19:36:17 2016
> -
> > 0300
> >
> > @@ -82,7 +82,7 @@
> >
> >
> >
> > // PPC 1, section 3.3.8, Fixed-Point Arithmetic Instructions
> >
> > inline void Assembler::addi(   Register d, Register a, int si16)   { assert(a !=
> R0,
> > "r0 not allowed"); addi_r0ok( d, a, si16); }
> >
> > -inline void Assembler::addis(  Register d, Register a, int si16)   { assert(a !=
> > R0, "r0 not allowed"); addis_r0ok(d, a, si16); }
> >
> > +inline void Assembler::addis(  Register d, Register a, int si16)   { assert(a !=
> > R0, "r0 not allowed"); if (d != a || si16 != 0) addis_r0ok(d, a,
> > si16); }
> >
> > inline void Assembler::addi_r0ok(Register d,Register a,int si16)   {
> > emit_int32(ADDI_OPCODE   | rt(d) | ra(a) | simm(si16, 16)); }
> >
> > inline void Assembler::addis_r0ok(Register d,Register a,int si16)  {
> > emit_int32(ADDIS_OPCODE  | rt(d) | ra(a) | simm(si16, 16)); }
> >
> > inline void Assembler::addic_( Register d, Register a, int si16)   {
> > emit_int32(ADDIC__OPCODE | rt(d) | ra(a) | simm(si16, 16)); }
> >
> >  However, I'm concerned this modification might be too intrusive. So I
> > came here
> >
> > to ask opinions on that.
> >
> >  Regards,
> >
> > Bruno Rosa

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic