[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-patches
Subject:    PING^4 [PATCH] i386: Add pass_remove_partial_avx_dependency
From:       "H.J. Lu" <hjl.tools () gmail ! com>
Date:       2018-09-29 23:05:49
Message-ID: CAMe9rOoVgzGnNyYvbKPOcdraTgmE-eEgJPL8_Sq1vdTPM4eC+Q () mail ! gmail ! com
[Download RAW message or body]

On Tue, Sep 18, 2018 at 9:44 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> >>> With -mavx, for
> >>>
> >>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
> >>> extern float f;
> >>> extern double d;
> >>> extern int i;
> >>>
> >>> void
> >>> foo (void)
> >>> {
> >>>   d = f;
> >>>   f = i;
> >>> }
> >>>
> >>> we need to generate
> >>>
> >>>         vxorp[ds]       %xmmN, %xmmN, %xmmN
> >>>         ...
> >>>         vcvtss2sd       f(%rip), %xmmN, %xmmX
> >>>         ...
> >>>         vcvtsi2ss       i(%rip), %xmmN, %xmmY
> >>>
> >>> to avoid partial XMM register stall.  This patch adds a pass to generate
> >>> a single
> >>>
> >>>         vxorps          %xmmN, %xmmN, %xmmN
> >>>
> >>> at function entry, which is shared by all SF and DF conversions, instead
> >>> of generating one
> >>>
> >>>         vxorp[ds]       %xmmN, %xmmN, %xmmN
> >>>
> >>> for each SF/DF conversion.
> >>>
> >>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
> >>>
> >>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
> >>>
> >>> are
> >>>
> >>> 1. On Broadwell server:
> >>>
> >>> 500.perlbench_r (-0.82%)
> >>> 502.gcc_r (0.73%)
> >>> 505.mcf_r (-0.24%)
> >>> 520.omnetpp_r (-2.22%)
> >>> 523.xalancbmk_r (-1.47%)
> >>> 525.x264_r (0.31%)
> >>> 531.deepsjeng_r (0.27%)
> >>> 541.leela_r (0.85%)
> >>> 548.exchange2_r (-0.11%)
> >>> 557.xz_r (-0.34%)
> >>> Geomean: (-0.23%)
> >>>
> >>> 503.bwaves_r (0.00%)
> >>> 507.cactuBSSN_r (-1.88%)
> >>> 508.namd_r (0.00%)
> >>> 510.parest_r (-0.56%)
> >>> 511.povray_r (0.49%)
> >>> 519.lbm_r (-1.28%)
> >>> 521.wrf_r (-0.28%)
> >>> 526.blender_r (0.55%)
> >>> 527.cam4_r (-0.20%)
> >>> 538.imagick_r (2.52%)
> >>> 544.nab_r (-0.18%)
> >>> 549.fotonik3d_r (-0.51%)
> >>> 554.roms_r (-0.22%)
> >>> Geomean: (0.00%)
> >>>
> >>> 2. On Skylake client:
> >>>
> >>> 500.perlbench_r (-0.29%)
> >>> 502.gcc_r (-0.36%)
> >>> 505.mcf_r (1.77%)
> >>> 520.omnetpp_r (-0.26%)
> >>> 523.xalancbmk_r (-3.69%)
> >>> 525.x264_r (-0.32%)
> >>> 531.deepsjeng_r (0.00%)
> >>> 541.leela_r (-0.46%)
> >>> 548.exchange2_r (0.00%)
> >>> 557.xz_r (0.00%)
> >>> Geomean: (-0.34%)
> >>>
> >>> 503.bwaves_r (0.00%)
> >>> 507.cactuBSSN_r (-0.56%)
> >>> 508.namd_r (0.87%)
> >>> 510.parest_r (0.00%)
> >>> 511.povray_r (-0.73%)
> >>> 519.lbm_r (0.84%)
> >>> 521.wrf_r (0.00%)
> >>> 526.blender_r (-0.81%)
> >>> 527.cam4_r (-0.43%)
> >>> 538.imagick_r (2.55%)
> >>> 544.nab_r (0.28%)
> >>> 549.fotonik3d_r (0.00%)
> >>> 554.roms_r (0.32%)
> >>> Geomean: (0.12%)
> >>>
> >>> 3. On Skylake server:
> >>>
> >>> 500.perlbench_r (-0.55%)
> >>> 502.gcc_r (0.69%)
> >>> 505.mcf_r (0.00%)
> >>> 520.omnetpp_r (-0.33%)
> >>> 523.xalancbmk_r (-0.21%)
> >>> 525.x264_r (-0.27%)
> >>> 531.deepsjeng_r (0.00%)
> >>> 541.leela_r (0.00%)
> >>> 548.exchange2_r (-0.11%)
> >>> 557.xz_r (0.00%)
> >>> Geomean: (0.00%)
> >>>
> >>> 503.bwaves_r (0.58%)
> >>> 507.cactuBSSN_r (0.00%)
> >>> 508.namd_r (0.00%)
> >>> 510.parest_r (0.18%)
> >>> 511.povray_r (-0.58%)
> >>> 519.lbm_r (0.25%)
> >>> 521.wrf_r (0.40%)
> >>> 526.blender_r (0.34%)
> >>> 527.cam4_r (0.19%)
> >>> 538.imagick_r (5.87%)
> >>> 544.nab_r (0.17%)
> >>> 549.fotonik3d_r (0.00%)
> >>> 554.roms_r (0.00%)
> >>> Geomean: (0.62%)
> >>>
> >>> On Skylake client, impacts on 538.imagick_r are
> >>>
> >>> size before:
> >>>
> >>>    text    data     bss     dec     hex filename
> >>> 2555577   10876    5576 2572029  273efd imagick_r.exe
> >>>
> >>> size after:
> >>>
> >>>    text    data     bss     dec     hex filename
> >>> 2511825   10876    5576 2528277  269415 imagick_r.exe
> >>>
> >>> number of vxorp[ds]:
> >>>
> >>> before          after           difference
> >>> 14570           4515            -69%
> >>>
> >>> OK for trunk?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> H.J.
> >>> ---
> >>> gcc/
> >>>
> >>> 2018-08-28  H.J. Lu  <hongjiu.lu@intel.com>
> >>>             Sunil K Pandey  <sunil.k.pandey@intel.com>
> >>>
> >>>         PR target/87007
> >>>         * config/i386/i386-passes.def: Add
> >>>         pass_remove_partial_avx_dependency.
> >>>         * config/i386/i386-protos.h
> >>>         (make_pass_remove_partial_avx_dependency): New.
> >>>         * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
> >>>         New function.
> >>>         (pass_data_remove_partial_avx_dependency): New.
> >>>         (pass_remove_partial_avx_dependency): Likewise.
> >>>         (make_pass_remove_partial_avx_dependency): Likewise.
> >>>         * config/i386/i386.md (SF/DF conversion splitters): Disabled
> >>>         for TARGET_AVX.
> >>>
> >>> gcc/testsuite/
> >>>
> >>> 2018-08-28  H.J. Lu  <hongjiu.lu@intel.com>
> >>>             Sunil K Pandey  <sunil.k.pandey@intel.com>
> >>>
> >>>         PR target/87007
> >>>         * gcc.target/i386/pr87007.c: New file.
> >>
> >>
> >> PING:
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html
> >>
> >
> > PING.
> >
>
> Hi Kirll, Jakub, Jan,
>
> Can you take a look?
>

PING.

-- 
H.J.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic