'BUG x86 fp stack overflow compiling at -O0'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-bugs
Subject:    BUG x86 fp stack overflow compiling at -O0
From:       grahams <grahams () rcp ! co ! uk>
Date:       1999-12-31 19:07:31
[Download RAW message or body]

Hi

Bootstrapping egcs-19991228 using installed egcs-19991221 fails because of 
differences in nearly all files.

Investigation showed that it was local register allocation selecting the 
registers in a different order. I tracked this down to the calculation 
of register priorities which determines the order in which register are
allocated.

The following macro is used in local-alloc.c to calculate allocation priorities
and this was sometimes returning bogus results (i.e. -2147483648).

#define QTY_CMP_PRI(q)		\
  ((int) (((double) (floor_log2 (qty[q].n_refs) * qty[q].n_refs * qty[q].size) \
	  / (qty[q].death - qty[q].birth)) * 10000))

This was being miscompiled by the installed egcs-19991221 compiler at -O0 
because the fix_truncsi_1 pattern was failing to pop the FP stack. 

Here is an analysis of the bug and a possible fix.

In i386.md there are define_insn's for *fix_truncsi_1 and *fix_truncdi_1
instruction patterns and corresponding define_split's. The define_insn's
use output_fix_truncate () in i386.c which uses REG_DEAD notes to determine
if the top of the FP stack dies and thus needs to be popped. These REG_DEAD
notes  are added during the reg_to_stack pass. So after the reg_to_stack pass
has completed the insns will have any necessary REG_DEAD notes.

Consider the following insn post reg_to_stack pass

(insn 32 30 34 (parallel[ 
            (set (reg:SI 1 edx)
                (fix:SI (reg:DF 8 st(0))))
            (clobber (mem:SI (plus:SI (reg:SI 6 ebp)
                        (const_int -4 [0xfffffffc])) 0))
            (clobber (mem:SI (plus:SI (reg:SI 6 ebp)
                        (const_int -8 [0xfffffff8])) 0))
            (clobber (reg:SI 0 eax))
        ] ) 152 {*fix_truncsi_1} (insn_list 30 (nil))
    (expr_list:REG_DEAD (reg:DF 8 st(0))
        (expr_list:REG_UNUSED (reg:SI 0 eax)
            (nil))))


Final instruction selection is now performed and the above
insn gets split by the define_split which matches the fix_truncsi_1
pattern. When the instruction is split any reg notes on the original
insn are lost.

Later output_fix_truncate () is called decides that the top of the FP 
stack does NOT die because there is no REG_DEAD note and hence does
not pop the FP stack.

The conclusion is that the define_split's corresponding to the 
fix_truncsi_1/fix_truncdi_1 insn cannot be used after the reg_to_stack
because any REG_DEAD notes will be lost and output_fix_truncate () will
generate the WRONG code if the original insn had a REG_DEAD note for 
the top of FP stack.

In my local tree I have modified the define_split to be disabled
after the reg_to_pass if the insn has a REG_DEAD note for operands[1]

Here's the original define_insn define_split for fix_truncsi_1

(define_insn "*fix_truncsi_1"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=m,?r")
	(fix:SI (match_operand 1 "register_operand" "f,f")))
   (clobber (match_operand:SI 2 "memory_operand" "=o,o"))
   (clobber (match_operand:SI 3 "memory_operand" "=m,m"))
   (clobber (match_scratch:SI 4 "=&r,r"))]
  "TARGET_80387 && FLOAT_MODE_P (GET_MODE (operands[1]))"
  "* return output_fix_trunc (insn, operands);"
  [(set_attr "type" "multi")])

(define_split 
  [(set (match_operand:SI 0 "register_operand" "")
	(fix:SI (match_operand 1 "register_operand" "")))
   (clobber (match_operand:SI 2 "memory_operand" ""))
   (clobber (match_operand:SI 3 "memory_operand" ""))
   (clobber (match_scratch:SI 4 ""))]
  "reload_completed
   && find_regno_note (insn, REG_DEAD, REGNO(operands[1])))"
  [(parallel [(set (match_dup 3) (fix:SI (match_dup 1)))
	      (clobber (match_dup 2))
	      (clobber (match_dup 3))
	      (clobber (match_dup 4))])
   (set (match_dup 0) (match_dup 3))]
  "")

and here's the modified define_split (the define_insn is unchanged)

(define_split 
  [(set (match_operand:SI 0 "register_operand" "")
	(fix:SI (match_operand 1 "register_operand" "")))
   (clobber (match_operand:SI 2 "memory_operand" ""))
   (clobber (match_operand:SI 3 "memory_operand" ""))
   (clobber (match_scratch:SI 4 ""))]
  "reload_completed
   && !(reg_to_stack_started
	&& find_regno_note (insn, REG_DEAD, REGNO(operands[1])))"
  [(parallel [(set (match_dup 3) (fix:SI (match_dup 1)))
	      (clobber (match_dup 2))
	      (clobber (match_dup 3))
	      (clobber (match_dup 4))])
   (set (match_dup 0) (match_dup 3))]
  "")

Note reg_to_stack is a new global set at the beginning of the
reg_to_stack pass and a similar change was added to the define_split
for the fix_truncdi_1 insn.

I have attached a small program which shows the effects of the
bug when compiled with egcs-19991221/28 on x86 using -O0 or -O1 
but works at -O2 (it works using -O2 because the instruction gets
split before the reg_to_stack pass and not post reg_to-stack pass
as occurs using -O0 or -O1)

The correct out should be
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000

but when it fails the output is
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=10000
priority(4,1,16,24)=-2147483648
priority(4,1,16,24)=-2147483648
priority(4,1,16,24)=-2147483648

We get the wrong answers because the FP stack has overflowed.

With the above modification to the define_split patterns
I get correct output for all optimization levels.

Graham
["bug3.c" (application/x-unknown-content-type-cfile)]

#include <stdio.h>

static int floor_log2(unsigned long long int x)
{
   int log = -1;
   while (x != 0)
   {
      log++;
      x >>= 1;
   }
   return log;
}

static int priority(int nrefs, int size, int birth, int death)
{
  return (int)((((double)(floor_log2 (nrefs) * nrefs * size))
	       / (death - birth)) * 10000);
}

int nrefs  = 4;
int size   = 1;
int birth  = 16;
int death  = 24;

int main()
{
  int i;

  for (i = 0; i < 10; i++)
    {
      int p = priority(nrefs, size, birth, death);

      printf("priority(%d,%d,%d,%d)=%d\n",
             nrefs, size, birth, death, p);
    }
  
  return 0;
}


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic