[prev in list] [next in list] [prev in thread] [next in thread] 

List:       debian-user
Subject:    Re: Segfaults after upgrade to Debian 11.7 on virtualized systems with AMD Ryzen CPU
From:       Andreas Haumer <andreas () xss ! co ! at>
Date:       2023-05-01 13:10:13
Message-ID: f8d126f4-99b1-7cf3-c93f-67a6a9fe4e74 () xss ! co ! at
[Download RAW message or body]

[Attachment #2 (multipart/mixed)]


Hi!

Thank you all for your reply!

Am 01.05.23 um 00:39 schrieb NetValue Operations Centre:
> I've tried downgrading libc (and related packages) to 2.31-13+deb11u5, but no \
> success - still getting segmentation faults. Booting back to the 5.10.0-21 kernel \
> seems the only solution at the moment. 

I now found out, that the 5.10.0-22 kernel boots fine, if I set the CPU
model manually to "EPYC-Rome" in my VM configuration (I use "virt-manager")

In that case, "virsh dumpxml" tells me about the VM's CPU:

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-Rome</model>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='xsaves'/>
    <feature policy='disable' name='svm'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='npt'/>
    <feature policy='disable' name='nrip-save'/>
  </cpu>


On the other hand, if I set the CPU model as "copy from host",
booting the 5.10.0-22 kernel results in the reported segfaults.

In that case, "virsh dumpxml" tells me the following:

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-Rome</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='erms'/>
    <feature policy='require' name='invpcid'/>
    <feature policy='require' name='pku'/>
    <feature policy='require' name='vaes'/>
    <feature policy='require' name='vpclmulqdq'/>
    <feature policy='require' name='fsrm'/>
    <feature policy='require' name='spec-ctrl'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaves'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='amd-ssbd'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='disable' name='lbrv'/>
    <feature policy='disable' name='tsc-scale'/>
    <feature policy='disable' name='vmcb-clean'/>
    <feature policy='disable' name='pause-filter'/>
    <feature policy='disable' name='pfthreshold'/>
    <feature policy='require' name='rdctl-no'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='mds-no'/>
    <feature policy='require' name='pschange-mc-no'/>
    <feature policy='disable' name='svm'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='npt'/>
    <feature policy='disable' name='nrip-save'/>
  </cpu>

On the host, "lscpu" tells me:

root@pauli:~# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           33
Model name:                      AMD Ryzen 9 5950X 16-Core Processor
Stepping:                        2
Frequency boost:                 enabled
CPU MHz:                         2200.000
CPU max MHz:                     5980,4678
CPU min MHz:                     2200,0000
BogoMIPS:                        8000.67
Virtualization:                  AMD-V
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        8 MiB
L3 cache:                        64 MiB
NUMA node0 CPU(s):               0-31
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via \
prctl Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and \
__user pointer sanitization Vulnerability Spectre v2:        Mitigation; Retpolines, \
IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected \
Vulnerability Srbds:             Not affected Vulnerability Tsx async abort:   Not \
                affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge \
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb \
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl \
pni pclmulqdq monit  or ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c \
rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch \
osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb \
cat_  l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep \
bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec \
xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf  \
xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean \
flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif \
v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm

I still do not see which system component is to blame here exactly, but it seems,
there actually is some issue with the current Debian 11.7 5.10.0-22 kernel.

Time to create a Debian bugreport?

Regards

- andreas

-- 
Andreas Haumer
*x Software + Systeme              | mailto:andreas@xss.co.at
Karmarschgasse 51/2/20             | https://www.xss.co.at/
A-1100 Vienna, Austria             | Tel: +43-1-6060114-0


["OpenPGP_signature.asc" (application/pgp-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic