[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: More annoying code generation by clang
From:       Ingo Molnar <mingo () kernel ! org>
Date:       2024-04-06 10:56:27
Message-ID: ZhEqW748nht2M4Si () gmail ! com
[Download RAW message or body]


[ I've Cc:-ed a few more people who might be interested in this. ]
[ Copy of Linus's email below. ]

* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So this doesn't really matter in any real life situation, but it
> really grated on me.
> 
> Clang has this nasty habit of taking our nice asm constraints, and
> turning them into worst-case garbage. It's been reported a couple of
> times where we use "g" to tell the compiler that pretty much any
> source to the asm works, and then clang takes that to mean "I will
> take that to use 'memory'" even when that makes no sense what-so-ever.
> 
> See for example
> 
>     https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
> 
> where I was ranting about clang just doing pointlessly stupid things.
> 
> However, I found a case where yes, clang does pointlessly stupid
> things, but it's at least _partly_ our fault, and gcc can't generate
> optimal code either.
> 
> We have this fairly critical code in __fget_files_rcu() to look up a
> 'struct file *' from an fd, and it does this:
> 
>                 /* Mask is a 0 for invalid fd's, ~0 for valid ones */
>                 nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> 
> and clang makes a *horrid* mess of it, generating this code:
> 
>         movl    %edi, %r14d
>         movq    32(%rbx), %rdx
>         movl    (%rdx), %eax
>         movq    %rax, 8(%rsp)
>         cmpq    8(%rsp), %r14
>         sbbq    %rcx, %rcx
> 
> which is just crazy. Notice how it does that "move rax to stack, then
> do the compare against the stack", instead of just using %rax.
> 
> In fact, that function shouldn't have a stack frame at all, and the
> only reason it is generated is because of this whole oddity.
> 
> All clang's fault, right?
> 
> Yeah, mostly. But it turns out that what really messes with clangs
> little head is that the x86 array_index_mask_nospec() function is
> being a bit annoying.
> 
> This is what we do:
> 
>   static __always_inline unsigned long
> array_index_mask_nospec(unsigned long index,
>                 unsigned long size)
>   {
>         unsigned long mask;
> 
>         asm volatile ("cmp %1,%2; sbb %0,%0;"
>                         :"=r" (mask)
>                         :"g"(size),"r" (index)
>                         :"cc");
>         return mask;
>   }
> 
> and look at the use again:
> 
>         nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> 
> here all the values are actually 'unsigned int'. So what happens is
> that clang can't just use the fdt->max_fds value *directly* from
> memory, because it needs to be expanded from 32-bit to 64-bit because
> we've made our array_index_mask_nospec() function only work on 64-bit
> 'unsigned long' values.
> 
> So it turns out that by massaging this a bit, and making it just be a
> macro - so that the asm can decide that "I can do this in 32-bit" - I
> can get clang to generate much better code.
> 
> Clang still absolutely hates the "g" constraint, so to get clang to
> really get this right I have to use "ir" instead of "g". Which is
> wrong. Because gcc does this right, and could use the memory op
> directly. But even gcc cannot do that with our *current* function,
> because of that "the memory value is 32-bit, we require a 64-bit
> value"
> 
> Anyway, I can get gcc to generate the right code:
> 
>         movq    32(%r13), %rdx
>         cmp (%rdx),%ebx
>         sbb %esi,%esi
> 
> which is basically the right code for the six crazy instructions clang
> generates. And if I make the "g" be "ir", I can get clang to generate
> 
>         movq    32(%rdi), %rcx
>         movl    (%rcx), %eax
>         cmpl    %eax, %esi
>         sbbl    %esi, %esi
> 
> which is the same thing, but with that (pointless) load to a register.
> 
> And now clang doesn't generate that stack frame at all.
> 
> Anyway, this was a long email to explain the odd attached patch.
> 
> Comments? Note that this patch is *entirely* untested, I have done
> this purely by looking at the code generation in fs/file.c.
> 
>                 Linus

>  arch/x86/include/asm/barrier.h | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 66e57c010392..6159d2cbbfde 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -33,20 +33,15 @@
>   * Returns:
>   *     0 - (index < size)
>   */
> -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> -		unsigned long size)
> -{
> -	unsigned long mask;
> -
> -	asm volatile ("cmp %1,%2; sbb %0,%0;"
> -			:"=r" (mask)
> -			:"g"(size),"r" (index)
> -			:"cc");
> -	return mask;
> -}
> -
> -/* Override the default implementation from linux/nospec.h. */
> -#define array_index_mask_nospec array_index_mask_nospec
> +#define array_index_mask_nospec(idx,sz) ({	\
> +	typeof((idx)+(sz)) __idx = (idx);	\
> +	typeof(__idx) __sz = (sz);		\
> +	typeof(__idx) __mask;			\
> +	asm volatile ("cmp %1,%2; sbb %0,%0"	\
> +			:"=r" (__mask)		\
> +			:"ir"(__sz),"r" (__idx)	\
> +			:"cc");			\
> +	__mask; })
>  
>  /* Prevent speculative execution past this barrier. */
>  #define barrier_nospec() asm volatile("lfence":::"memory")


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic