'Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling
From:       Steven Rostedt <rostedt () goodmis ! org>
Date:       2024-02-21 18:19:01
Message-ID: 20240221131901.69c80c47 () gandalf ! local ! home
[Download RAW message or body]

On Mon, 19 Feb 2024 08:48:20 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > I will look again -- it is quite possible that I was confused by earlier
> > in-fleet setups that had Tasks RCU enabled even when preemption was
> > disabled.  (We don't do that anymore, and, had I been paying sufficient
> > attention, would not have been doing it to start with.  Back in the day,
> > enabling rcutorture, even as a module, had the side effect of enabling
> > Tasks RCU.  How else to test it, right?  Well...)  
> 
> OK, I got my head straight on this one...
> 
> And the problem is in fact that Tasks RCU isn't normally present
> in non-preemptible kernels.  This is because normal RCU will wait
> for preemption-disabled regions of code, and in PREMPT_NONE and
> PREEMPT_VOLUNTARY kernels, that includes pretty much any region of code
> lacking an explicit schedule() or similar.  And as I understand it,
> tracing trampolines rely on this implicit lack of preemption.
> 
> So, with lazy preemption, we could preempt in the middle of a
> trampoline, and synchronize_rcu() won't save us.
> 
> Steve and Mathieu will correct me if I am wrong.
> 
> If I do understand this correctly, one workaround is to remove the
> "if PREEMPTIBLE" on all occurrences of "select TASKS_RCU".  That way,
> all kernels would use synchronize_rcu_tasks(), which would wait for
> a voluntary context switch.
> 
> This workaround does increase the overhead and tracepoint-removal
> latency on non-preemptible kernels, so it might be time to revisit the
> synchronization of trampolines.  Unfortunately, the things I have come
> up with thus far have disadvantages:
> 
> o	Keep a set of permanent trampolines that enter and exit
> 	some sort of explicit RCU read-side critical section.
> 	If the address for this trampoline to call is in a register,
> 	then these permanent trampolines remain constant so that
> 	no synchronization of them is required.  The selected
> 	flavor of RCU can then be used to deal with the non-permanent
> 	trampolines.
> 
> 	The disadvantage here is a significant increase in the complexity
> 	and overhead of trampoline code and the code that invokes the
> 	trampolines.  This overhead limits where tracing may be used
> 	in the kernel, which is of course undesirable.

I wonder if we can just see if the instruction pointer at preemption is at
something that was allocated? That is, if it __is_kernel(addr) returns
false, then we need to do more work. Of course that means modules will also
trigger this. We could check __is_module_text() but that does a bit more
work and may cause too much overhead. But who knows, if the module check is
only done if the __is_kernel() check fails, maybe it's not that bad.

-- Steve

> 
> o	Check for being preempted within a trampoline, and track this
> 	within the tasks structure.  The disadvantage here is that this
> 	requires keeping track of all of the trampolines and adding a
> 	check for being in one on a scheduler fast path.
> 
> o	Have a variant of Tasks RCU which checks the stack of preempted
> 	tasks, waiting until all have been seen without being preempted
> 	in a trampoline.  This still requires keeping track of all the
> 	trampolines in an easy-to-search manner, but gets the overhead
> 	of searching off of the scheduler fastpaths.
> 
> 	It is also necessary to check running tasks, which might have
> 	been interrupted from within a trampoline.
> 
> 	I would have a hard time convincing myself that these return
> 	addresses were unconditionally reliable.  But maybe they are?
> 
> o	Your idea here!
> 
> Again, the short-term workaround is to remove the "if PREEMPTIBLE" from
> all of the "select TASKS_RCU" clauses.
> 
> > > > My next step is to try this on bare metal on a system configured as
> > > > is the fleet.  But good progress for a week!!!  
> > > 
> > > Yeah this is great. Fingers crossed for the wider set of tests.  
> > 
> > I got what might be a one-off when hitting rcutorture and KASAN harder.
> > I am running 320*TRACE01 to see if it reproduces.  
> 
> [ . . . ]
> 
> > So, first see if it is reproducible, second enable more diagnostics,
> > third make more grace-period sequence numbers available to rcutorture,
> > fourth recheck the diagnostics code, and then see where we go from there.
> > It might be that lazy preemption needs adjustment, or it might be that
> > it just tickled latent diagnostic issues in rcutorture.
> > 
> > (I rarely hit this WARN_ON() except in early development, when the
> > problem is usually glaringly obvious, hence all the uncertainty.)  
> 
> And it is eminently reproducible.  Digging into it...

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic