[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc-bugs
Subject:    [Bug c/100363] New: gcc generating wider load/store than warranted at -O3
From:       vgupta at synopsys dot com via Gcc-bugs <gcc-bugs () gcc ! gnu ! org>
Date:       2021-04-30 20:09:42
Message-ID: bug-100363-4 () http ! gcc ! gnu ! org/bugzilla/
[Download RAW message or body]

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

            Bug ID: 100363
           Summary: gcc generating wider load/store than warranted at -O3
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vgupta at synopsys dot com
  Target Milestone: ---

Created attachment 50722
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50722&action=edit
test case with an additional nop to annotate codegen

In Linux kernel's initramfs gzip inflate code, an inner copy loop using
unsigned short pointers (src/dst) is generated with wider 8 or 16-byte at a
time (vs. 2 bytes at a time) causing extra/unintended bytes to be copied -
leading to corruption of inflated files on target.

The showed up on upstream v5.6 Linux kernel built for ARC (defaults to -O3).
Issue doesn't happen at -O2.

Full test case attached, but the gist of it is:

    lib/zlib_inflate/inffast.c

    if (dist > 2) {
        unsigned short *sfrom;

        sfrom = (unsigned short *)(from);
        loops = len >> 1;
        do
            *sout++ = *sfrom++;

        while (--loops);
        out = (unsigned char *)sout;
        from = (unsigned char *)sfrom;
    }
    ...

@sfrom and @sout are unsigned short pointers and thus expected to work on 2
bytes. However at -O3 gcc is generating wide loads (8-byte LDD/STD on ARCv2,
16-byte LDR q0 on aarch64.

For aarch64, it seems there's code generated for 16-byte access as well as
2-byte, and I haven't verified if it elides the 16-byte code based on size etc
- but the code is generated nonetheless. For ARC 8-byte loop is certainly
executed causing bad things as described

The issue was originally seen with mainline gcc 10.2 (again both ARC and
aarch64) at -O3 and I can confirm it exists in gcc 9.3 as well.

Attaching preprocessed source file is from ARC linux build (but builds for
aarch64 too since non of arch specific functions are used here.=
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic