[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-edac
Subject:    Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery
From:       Borislav Petkov <bp () alien8 ! de>
Date:       2021-01-28 17:57:35
Message-ID: 20210128175735.GB2120 () zn ! tnic
[Download RAW message or body]

On Tue, Jan 26, 2021 at 02:36:05PM -0800, Luck, Tony wrote:
> In some cases Linux might context switch to something else. Perhaps
> this task even gets picked up by another CPU to run the task work
> queued functions.  But I imagine that the context switch should act
> as a barrier ... shouldn't it?

I'm given to understand that the #MC from user is likely to schedule and
a context switch has a barrier character.

> After a few cycles of the test injection to user mode, I saw an
> overflow in the machine check bank. As if it hadn't been cleared
> from the previous iteration ...

This sounds weird. As if something else is happening which we haven't
thought of yet...

> When the tests were failing, code was on top of v5.11-rc3. Latest
> experiments moved to -rc5.  There's just a tracing fix from
> PeterZ between rc3 and rc5 to mce/core.c:
> 
> 737495361d44 ("x86/mce: Remove explicit/superfluous tracing")
> 
> which doesn't appear to be a candidate for the problems I saw.

Doesn't look like it.

> This is the bit that changed during my detour using atomic_t mce_count.
> I added the local variable to capture value from atomic_inc_return(), then
> used it later, instead of a bunch of atomic_read() calls.
> 
> I kept it this way because "if (count == 1)" is marginally easier to read
> than "if (current->mce_count++ == 0)"

Right.

So still no explanation why it would fail before. ;-\

Crazy idea: if you still can reproduce on -rc3, you could bisect: i.e.,
if you apply the patch on -rc3 and it explodes and if you apply the same
patch on -rc5 and it works, then that could be a start... Yeah, don't
have a better idea here. :-\

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic