[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-bugs
Subject:    Re: 7.0 GENERIC.MP#256: Unpredictable and unrecoverable freezing
From:       Jonathan Gray <jsg () jsg ! id ! au>
Date:       2022-01-16 13:57:56
Message-ID: YeQkZDjEjyRZ+VAD () largo ! jsg ! id ! au
[Download RAW message or body]

On Sat, Jan 15, 2022 at 01:36:55PM -0500, Agnosto Dvonik wrote:
> Synopsis:       Unpredictable and unrecoverable freezing
> Category:       kernel panic
> Environment:
>         System      : OpenBSD 7.0
>         Details     : OpenBSD 7.0-current (GENERIC.MP) #256: Fri Jan 14
> 22:30:45 MST 2022
> deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>         Architecture: OpenBSD.amd64
>         Machine     : amd64
> Description:
>     On kernels #254, #255, and #256 there is freezing when certain programs
> (mostly larger) programs are run, such as firefox-96.0 or mpv-0.34.0p1. What
> confuses me about this most would be how top(1) reports normal CPU usage,
> and df(1) reports normal disk usage in /tmp and in my mfs partition in
> ~/.cache.
>     This freezing does not go away after a few minutes or even a few hourse,
> forcing me to manually powerdown the machine. I have also noticed that the
> fans in the system start to whirr a bit more when this happens than in
> normal usage.
>     In some instances of this happening, a kernel panic message has been
> produced:
> 
>     splassert: assertwaitok: want 0 have 4
>     panic: kernel diagnostic assertion "p -> p_wchan == NULL" failed: file
> "/usr/src/sys/kern/kern_sched.c", line 355
> 
>     When this message occurs, the kernel seems to have frozen as well, and
> does not exit into ddb(4) as expected, and no crash results are sent to
> /var/crash.
> How-To-Repeat:
>     # I don't know exactly how to get the error, but usually running
> firefox(1) or mpv(1) for anywhere around ~2-10 minutes can trigger it.
>     # However, a normal X session cannot do the same.

Antoine reported seeing freezes on a comet lake machine and this
diff seems to have resolved it for him.

It changes irq work from interrupt context back to process context like
our 5.10 drm used (via a task) by changing from timeout_set() to
timeout_set_proc().

irq work is supposed to be in interrupt context but some path used on
inteldrm with gen 9 graphics sleeps.

Index: sys/dev/pci/drm/include/linux/irq_work.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/include/linux/irq_work.h,v
retrieving revision 1.6
diff -u -p -r1.6 irq_work.h
--- sys/dev/pci/drm/include/linux/irq_work.h	14 Jan 2022 06:53:14 -0000	1.6
+++ sys/dev/pci/drm/include/linux/irq_work.h	15 Jan 2022 23:06:06 -0000
@@ -32,7 +32,7 @@ typedef void (*irq_work_func_t)(struct i
 static inline void
 init_irq_work(struct irq_work *work, irq_work_func_t func)
 {
-	timeout_set(&work->to, (void (*)(void *))func, work);
+	timeout_set_proc(&work->to, (void (*)(void *))func, work);
 }
 
 static inline bool

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic