[prev in list] [next in list] [prev in thread] [next in thread] 

List:       netbsd-port-xen
Subject:    occasional/rash of "vcpu1: CPU failed to start" and hung domU
From:       "Greg A. Woods" <woods () planix ! ca>
Date:       2020-05-25 21:36:05
Message-ID: m1jdKla-0036tPC () more ! local
[Download RAW message or body]


I've been working on updating some of my home servers again recently and
today I encountered a "rash" of domU hangs due to "vcpu1: CPU failed to
start" problems on one of the domUs on a Xen server.  The whole dmesg
from such a failure is below.

This was very odd as the same kernel had just booted A-OK a couple of
times.

However persistence paid off and eventually after several tries it
booted OK again.

There seems to be two problems:

1. perhaps the delay while waiting for a CPU to become ready isn't quite
long enough sometimes, but why won't the CPU start???

2. perhaps the kernel shouldn't hang when a CPU fails to start?  I see
there is some #ifdef debug code that will call Debugger() if it is
enabled, and an extra printf() that suggests resume is possible, but I'm
not clear as to whether or not it can be possible to limp along without
a secondary CPU that won't start, and if so what must be done to ignore
the stuck CPU.

The server is running Xen-4.11.1nb2 (with a similar vintage dom0 kernel).

--
					Greg A. Woods <gwoods@acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>


[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, \
2005, [   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, \
2016, 2017, [   1.0000000]     2018, 2019 The NetBSD Foundation, Inc.  All rights \
reserved. [   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 8.99.32 (XEN3_DOMU) #11: Sat May 16 18:39:56 PDT 2020
[   1.0000000]  woods@future:/build/woods/future/current-amd64-amd64-obj/more/work/woods/m-NetBSD-current/sys/arch/amd64/compile/XEU
 [   1.0000000] total memory = 8000 MB
[   1.0000000] avail memory = 7736 MB
[   1.0000000] rnd: callout attached as an entropy source (collecting)
[   1.0000000] rnd: initialised (4096) with counter
[   1.0000000] rnd: printf attached as an entropy source (collecting without \
estimation) [   1.0000000] rnd: autoconf attached as an entropy source (collecting)
[   1.0000000] timecounter: Timecounters tick every 1.000 msec
[   1.0000000] rnd: WARNING! initial entropy low (5).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: system-power attached as an entropy source (collecting)
[   1.0000000] running cgd selftest aes-xts-256 aes-xts-512 done
[   1.0000000] mainbus0 (root)
[   1.0000000] hypervisor0 at mainbus0: Xen version 4.11.1nb2
[   1.0000000] hypervisor0: features:  mmu_pt_update_preserve_ad highmem_assist \
gnttab_map_avail_bits [   1.0000000] VIRQ_DEBUG interrupt using event channel 3
[   1.0000000] vcpu0 at hypervisor0
[   1.0000000] vcpu0: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu0: package 0, core 2, smt 0
[   1.0000000] vcpu1 at hypervisor0
[   1.0000000] vcpu1: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu1: package 0, core 3, smt 0
[   1.0000000] vcpu2 at hypervisor0
[   1.0000000] vcpu2: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu2: package 1, core 0, smt 0
[   1.0000000] vcpu3 at hypervisor0
[   1.0000000] vcpu3: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu3: package 1, core 1, smt 0
[   1.0000000] vcpu4 at hypervisor0
[   1.0000000] vcpu4: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu4: package 1, core 2, smt 0
[   1.0000000] vcpu5 at hypervisor0
[   1.0000000] vcpu5: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu5: package 1, core 3, smt 0
[   1.0000000] vcpu6 at hypervisor0
[   1.0000000] vcpu6: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu6: package 0, core 1, smt 0
[   1.0000000] vcpu7 at hypervisor0
[   1.0000000] vcpu7: Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
[   1.0000000] vcpu7: package 0, core 0, smt 0
[   1.0000000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
[   1.0000000] xencons0 at hypervisor0: Xen Virtual Console Driver
[   1.0000000] xencons0: console major 143, unit 0
[   1.0000000] xencons0: using event channel 2
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] rnd: WARNING! initial entropy low (0).
[   1.0000000] rnd: WARNING! initial entropy low (1).
[   1.0000000] timecounter: Timecounter "clockinterrupt" frequency 1000 Hz quality 0
[   1.0000030] timecounter: Timecounter "xen_system_time" frequency 1000000000 Hz \
quality 10000 [   1.0000030] Xen vcpu0 clock: using event channel 5
[   1.0000030] rnd: cpu0 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu1 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu2 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu3 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu4 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu5 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu6 attached as an entropy source (collecting)
[   1.0000030] rnd: cpu7 attached as an entropy source (collecting)
[   1.0000030] Xen vcpu1 clock: using event channel 7
[   2.2020928] vcpu1: CPU failed to start
[   2.2020928] Xen vcpu2 clock: using event channel 9
[   2.2030934] Xen vcpu3 clock: using event channel 11
[   2.2030934] Xen vcpu4 clock: using event channel 13
[   2.2030934] Xen vcpu5 clock: using event channel 15
[   2.2030934] Xen vcpu6 clock: using event channel 17
[   2.2040941] Xen vcpu7 clock: using event channel 19


[Attachment #3 (application/pgp-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic