[prev in list] [next in list] [prev in thread] [next in thread] 

List:       binutils
Subject:    Re: Risc - V failures inside ld-undefined
From:       Palmer Dabbelt <palmer () dabbelt ! com>
Date:       2022-09-30 21:40:42
Message-ID: mhng-b17d9bae-597f-4e27-bd45-2be9e4338034 () palmer-ri-x1c9
[Download RAW message or body]

On Fri, 30 Sep 2022 05:29:36 PDT (-0700), binutils@sourceware.org wrote:
> Hi all,
>
> I've been running the ld testsuite for Risc-V targets using a cross
> compiler and I'm seeing some failures in "undefined".
> PASS: undefined
> FAIL: undefined function
> FAIL: undefined line
>
> As I'm using a custom gcc, I just want confirmation from the Risc-V
> maintainers that these two tests are either failing or being skipped
> on their side.

They're not failing for me in a local run, unless I'm somehow 
misunderstanding the test results.  My "make check-ld" shows

		=== ld Summary ===

# of expected passes		797
# of expected failures		8
# of untested testcases		26
# of unsupported tests		175
./ld-new 2.39.50.20220930

> I've a patch fixing the testsuite for cross-compiler which might
> explain why it has always been skipped until now and why this error
> has been masked.

OK, probably best to send the patch, then?  There's certainly been cases 
before where we've missed failures because they're being skipped due to 
environment issues, if it's possible to get the test running for more 
people then it'll be more likely to get fixed -- at least with it 
showing up in the tests as an unexpected failure it'll be harder to 
forget about.

> Note that I've investigated a bit and it seems to be related to
> R_RISCV_ALIGN and the associated linker relaxation.
> Instead of showing
>   | ld: tmpdir/undefined.o: in function `function':
>   | .../undefined.c:9: undefined reference to `this_function_is_not_defined'
> the error is:
>   | riscv64-elf-ld: tmpdir/undefined.o: in function `.Ltext0':
>   | undefined.c:(.text+0x0): undefined reference to
> `this_function_is_not_defined'
>
> The idea is that "function" has an alignment of 2 and thus gas will
> add NOP instructions and a R_RISCV_ALIGN relocation before it (within
> tc-riscv.c:riscv_frag_align_code).
>   | $ riscv64-elf-objdump -D tmpdir/undefined.o
>   | 0000000000000000 <.Ltext0>:
>   |   0:   0001                    nop
>   |
>   | 0000000000000002 <function>:
>   |   2:   00000317                auipc   t1,0x0
>   |   6:   00030067                jr      t1 # 2 <function>
>
> When the linker will relax these NOPs (in
> elfnn-riscv.c:riscv_relax_delete_bytes), it will correctly change the
> relocation offset but not the DWARF information. Thus, when the error
> is trying to get the nearest line using _bfd_dwarf2_find_nearest_line,
> it ends up using a modified offset with non-modified DWARF information
> and targets the wrong symbol.
> I guess riscv_relax_delete_bytes can be improved to adjust the DWARF
> part too. But I'm not familiar with this part of bfd and I'm not sure
> it's even possible.
> Otherwise, I guess XFAIL these tests should be fine as they are
> similar to other XFAILs related to DWARF-2 (on arm-elf). Even if not
> having a "function" being shown in the error might be problematic.

That seems like a plausible issue, and it's certainly not surprising 
that there's a bug in debug info handling related to relaxation -- both 
because we've had those before, and because it's the combination of some 
complex and hard to test topics.  Probably best to just start with 
getting this running for everyone so we can chew on it a bit, if it is a 
XFAIL then we should at least have some sort of error here rather than 
just producing bad binaries.

> Anyway, the fact that I'm using a custom gcc shouldn't matter here but
> I want to confirm that first.

That also seems reasonable, but there's quite a bit of coupling between 
the compiler, assembler, and linker when it comes to DWARF on RISC-V as 
we don't support all the flavors of DWARF-relate directives.  We should 
have errors for all those, but it's entirely possible something has been 
missed.  That's even more likely if your custom compiler that's doing 
something different than we normally test against, but IMO that's still 
a bug that we should fix (at least with an error message).

>
> Thanks,
> Clément
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic