'Re: Kernel panic upon resume of Linux/KVM VM (OpenBSD 6.6)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-bugs
Subject:    Re: Kernel panic upon resume of Linux/KVM VM (OpenBSD 6.6)
From:       Mike Larkin <mlarkin () nested ! page>
Date:       2019-10-22 23:52:08
Message-ID: 20191022235208.GC9217 () azathoth ! net
[Download RAW message or body]

On Tue, Oct 22, 2019 at 04:25:19PM -0700, guenther@openbsd.org wrote:
> On Tue, 22 Oct 2019, Andreas Rottmann wrote:
> > >Synopsis:	panic: pvclock0: unstable result on stable clock
> > >Category:	virtualization
> > >Environment:
> > 	System      : OpenBSD 6.6
> > 	Details     : OpenBSD 6.6 (GENERIC.MP) #372: Sat Oct 12 10:56:27 MDT 2019
> > 			 deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > 	Architecture: OpenBSD.amd64
> > 	Machine     : amd64
> > >Description:
> > 
> > I've just experienced a kernel panic when resuming my laptop from 
> > suspend-to-RAM while my OpenBSD 6.6 VM was running; the first few lines 
> > of the crash read like this:
> > 
> > panic: pvclock0: unstable result on stable clock
> > Stopped at      db_enter+0x10:  popq    %rbp
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > db_enter() at db_enter+0x10
> > panic() at panic+0x128
> > pvclock_get_timecount(ffffffff81f14360) at pvclock_get_timecount+0xc2
> > 
> > The full ddb session, including backtraces for both cores, and the `ps`
> > output is attached as `ddb.txt`.
> 
> So the immediate code of the panic is this:
>         /* This bit must be set as we attached based on the stable flag */
>         if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0)
>                 panic("%s: unstable result on stable clock", DEVNAME(sc));
> 
> That is, the pvclock driver currently assumes that if it advertises a 
> stable clock when the OpenBSD guest is booted, then it'll remain stable 
> forever.  That apparently is not a safe assumption across a suspend/resume 
> cycle in the Linux/KVM host.
> 

It probably also isn't a safe assumption in a live migration scenario,
either, if you're correct above.

-ml

> To fix this, the driver would have to get the system to stop using it as 
> the active timecounter whenever its marked instable.  Perhaps it could 
> just adjust its quality (sc->sc_tc->tc_quality) downward while that's the 
> case?  I'm not sure if that would be enough, but you could try 
> implementing that.
> 
> Lacking that, I guess you'll want to have KVM stop the guest before you 
> suspend the host, and then on resume wait a bit until the clock 
> settles--not sure how long that takes or how you would know--before 
> restarting the guest.
> 
> 
> Philip Guenther
> 

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic