[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-bugs
Subject:    [Bug tree-optimization/46590] long compile time with -O2 and many loops
From:       "tkoenig at gcc dot gnu.org" <gcc-bugzilla () gcc ! gnu ! org>
Date:       2019-03-31 19:47:12
Message-ID: bug-46590-4-YOgtokvRgn () http ! gcc ! gnu ! org/bugzilla/
[Download RAW message or body]

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #48 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
The test case from comment#5 and comment#6 has regressed for M7/8/9:

$ time gfortran-4.8 -O1  gener-4.f90 

real    0m11.509s
user    0m11.356s
sys     0m0.148s
$ time gfortran-7 -O1  gener-4.f90 

real    0m23.630s
user    0m23.475s
sys     0m0.142s
$ time gfortran-8 -O1  gener-4.f90 

real    0m23.702s
user    0m23.356s
sys     0m0.335s
$ time gfortran -O1  gener-4.f90 

real    0m24.708s
user    0m24.577s
sys     0m0.107s

(where gfortran is a recent trunk, without checking).

About half the time is spent in  df live&initialized regs, with another
big chunk in tree copy headers:

$ gfortran -O1 -ftime-report gener-4.f90 

Time variable                                   usr           sys          wall
              GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    182 kB (  0%)
 phase parsing                      :   0.30 (  1%)   0.02 (  8%)   0.32 (  1%)
  18037 kB ( 11%)
 phase opt and generate             :  23.81 ( 99%)   0.24 ( 92%)  24.06 ( 99%)
 143289 kB ( 89%)
 callgraph construction             :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
   4980 kB (  3%)
 ipa function summary               :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
   1414 kB (  1%)
 ipa inlining heuristics            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 ipa pure const                     :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 cfg construction                   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    890 kB (  1%)
 cfg cleanup                        :   0.05 (  0%)   0.00 (  0%)   0.08 (  0%)
      0 kB (  0%)
 trivially dead code                :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 df scan insns                      :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)
      0 kB (  0%)
 df multiple defs                   :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 df reaching defs                   :   0.97 (  4%)   0.01 (  4%)   0.99 (  4%)
      0 kB (  0%)
 df live regs                       :   0.24 (  1%)   0.00 (  0%)   0.21 (  1%)
      0 kB (  0%)
 df live&initialized regs           :  12.11 ( 50%)   0.01 (  4%)  12.03 ( 49%)
      0 kB (  0%)
 df use-def / def-use chains        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 df reg dead/unused notes           :   0.24 (  1%)   0.00 (  0%)   0.24 (  1%)
   2811 kB (  2%)
 register information               :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 alias analysis                     :   0.06 (  0%)   0.00 (  0%)   0.08 (  0%)
   2048 kB (  1%)
 alias stmt walking                 :   1.55 (  6%)   0.06 ( 23%)   1.65 (  7%)
     92 kB (  0%)
 register scan                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    189 kB (  0%)
 rebuild jump labels                :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 parser (global)                    :   0.30 (  1%)   0.02 (  8%)   0.32 (  1%)
  18037 kB ( 11%)
 inline parameters                  :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
    513 kB (  0%)
 tree gimplify                      :   0.07 (  0%)   0.01 (  4%)   0.08 (  0%)
  13934 kB (  9%)
 tree eh                            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 tree CFG construction              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   5209 kB (  3%)
 tree CFG cleanup                   :   0.34 (  1%)   0.01 (  4%)   0.37 (  2%)
   1697 kB (  1%)
 tree copy propagation              :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 tree PTA                           :   0.21 (  1%)   0.00 (  0%)   0.22 (  1%)
   1269 kB (  1%)
 tree PHI insertion                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   2644 kB (  2%)
 tree SSA rewrite                   :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
   3119 kB (  2%)
 tree SSA other                     :   0.01 (  0%)   0.02 (  8%)   0.03 (  0%)
      0 kB (  0%)
 tree SSA incremental               :   0.08 (  0%)   0.00 (  0%)   0.10 (  0%)
   4729 kB (  3%)
 tree operand scan                  :   0.04 (  0%)   0.01 (  4%)   0.06 (  0%)
   3526 kB (  2%)
 dominator optimization             :   0.27 (  1%)   0.01 (  4%)   0.23 (  1%)
   5850 kB (  4%)
 tree SRA                           :   0.13 (  1%)   0.00 (  0%)   0.14 (  1%)
    562 kB (  0%)
 tree CCP                           :   0.32 (  1%)   0.01 (  4%)   0.30 (  1%)
   1226 kB (  1%)
 tree reassociation                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 tree FRE                           :   0.60 (  2%)   0.02 (  8%)   0.59 (  2%)
   2505 kB (  2%)
 tree code sinking                  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
     60 kB (  0%)
 tree linearize phis                :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      2 kB (  0%)
 tree backward propagate            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 tree forward propagate             :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
    816 kB (  1%)
 tree phiprop                       :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 tree conservative DCE              :   0.02 (  0%)   0.02 (  8%)   0.07 (  0%)
      0 kB (  0%)
 tree aggressive DCE                :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
    768 kB (  0%)
 tree DSE                           :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 tree loop invariant motion         :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 tree canonical iv                  :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)
   2262 kB (  1%)
 scev constant prop                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    228 kB (  0%)
 complete unrolling                 :   0.25 (  1%)   0.02 (  8%)   0.24 (  1%)
   9319 kB (  6%)
 tree iv optimization               :   0.13 (  1%)   0.01 (  4%)   0.15 (  1%)
   7884 kB (  5%)
 tree copy headers                  :   3.15 ( 13%)   0.01 (  4%)   3.17 ( 13%)
   4763 kB (  3%)
 dominance frontiers                :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 dominance computation              :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
      0 kB (  0%)
 out of ssa                         :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 expand vars                        :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    589 kB (  0%)
 expand                             :   0.10 (  0%)   0.00 (  0%)   0.10 (  0%)
  23667 kB ( 15%)
 post expand cleanups               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 forward prop                       :   0.09 (  0%)   0.00 (  0%)   0.07 (  0%)
    908 kB (  1%)
 CSE                                :   0.11 (  0%)   0.00 (  0%)   0.11 (  0%)
   1458 kB (  1%)
 dead code elimination              :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 dead store elim1                   :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
   2334 kB (  1%)
 dead store elim2                   :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
   2436 kB (  2%)
 loop init                          :   0.18 (  1%)   0.00 (  0%)   0.21 (  1%)
   8614 kB (  5%)
 loop invariant motion              :   0.34 (  1%)   0.00 (  0%)   0.45 (  2%)
    151 kB (  0%)
 loop fini                          :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 branch prediction                  :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
    778 kB (  0%)
 combiner                           :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)
   1168 kB (  1%)
 integrated RA                      :   0.28 (  1%)   0.00 (  0%)   0.26 (  1%)
   9050 kB (  6%)
 LRA non-specific                   :   0.12 (  0%)   0.00 (  0%)   0.15 (  1%)
    394 kB (  0%)
 LRA virtuals elimination           :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
   1547 kB (  1%)
 LRA create live ranges             :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)
    121 kB (  0%)
 LRA hard reg assignment            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 reload CSE regs                    :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)
   1373 kB (  1%)
 thread pro- & epilogue             :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      4 kB (  0%)
 hard reg cprop                     :   0.03 (  0%)   0.01 (  4%)   0.04 (  0%)
      0 kB (  0%)
 reorder blocks                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    121 kB (  0%)
 shorten branches                   :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 final                              :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)
   1178 kB (  1%)
 straight-line strength reduction   :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
    508 kB (  0%)
 rest of compilation                :   0.15 (  1%)   0.00 (  0%)   0.16 (  1%)
   1247 kB (  1%)
 remove unused locals               :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 address taken                      :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 TOTAL                              :  24.11          0.26         24.39       
 161510 kB=
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic