[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] Xen Security Advisory 348 v3 (CVE-2020-29566) - undue recursion in x86 HVM context sw
From:       Xen.org security team <security () xen ! org>
Date:       2020-12-15 12:20:21
Message-ID: E1kp9JV-00074c-Ji () xenbits ! xenproject ! org
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2020-29566 / XSA-348
                               version 3

            undue recursion in x86 HVM context switch code

UPDATES IN VERSION 3
====================

Public release.

ISSUE DESCRIPTION
=================

When they require assistance from the device model, x86 HVM guests
must be temporarily de-scheduled.  The device model will signal Xen
when it has completed its operation, via an event channel, so that the
relevant vCPU is rescheduled.

If the device model were to signal Xen without having actually
completed the operation, the de-schedule / re-schedule cycle would
repeat.  If, in addition, Xen is resignalled very quickly, the
re-schedule may occur before the de-schedule was fully complete,
triggering a shortcut.  This potentially repeating process uses
ordinary recursive function calls, so could result a stack overflow.

IMPACT
======

A malicious or buggy stubdomain serving a HVM guest can cause Xen to
crash, resulting in a Denial of Service (DoS) to the entire host.

VULNERABLE SYSTEMS
==================

All Xen versions are vulnerable.

Only x86 systems are affected.  Arm systems are not affected.

Only x86 stubdomains serving HVM guests can exploit the vulnerability.

MITIGATION
==========

Running only PV or PVH guests will avoid the vulnerability.

(Switching from a device model stub domain to a dom0 device model does
NOT mitigate this vulnerability.  Rather, it simply recategorises the
vulnerability to hostile management code, regarding it "as designed";
thus it merely reclassifies these issues as "not a bug".  The security
of a Xen system using stub domains is still better than with a qemu-dm
running as a dom0 process.  Users and vendors of stub qemu dm systems
should not change their configuration to use a dom0 qemu process.)

CREDITS
=======

This issue was discovered by Julien Grall of Amazon.

RESOLUTION
==========

Applying the appropriate (set of) attached patch(es) resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa348-?.patch           xen-unstable - Xen 4.14.x
xsa348-4.13-?.patch      Xen 4.13.x
xsa348-4.12.patch        Xen 4.12.x
xsa348-4.11.patch        Xen 4.11.x
xsa348-4.10.patch        Xen 4.10.x

$ sha256sum xsa348*
f9606145cdbd3caacf6be7e5bcb62fc7d2c0b76572c1be26db608c5eac57ead0  xsa348.meta
b619dac8453daa9f85526dec67ed67d999d182ccbc39b91be122b3365a0b5cb9  xsa348-1.patch
01b11ea3be160704c992187ad727ac1f03841cc452bbe2c142b53fddfa2da844  xsa348-2.patch
2c54474da9680625717e5a61b2a3a5ac23acad6f7bc0fcb306fe181fd0a38f1d  xsa348-3.patch
e2f4cbec1a763f045e827ececf13d06dedcc7cc49b42136160c8d986778529ae  xsa348-4.10.patch
15d4f5fb894a45027f4a17a557d4fdb0a390575ab2c2d3aa2b265d3c6239c765  xsa348-4.11.patch
58b1a771dc720b1efb205a9d1baf46aea0205d4c65310e693dd2cfe7834cd8b9  xsa348-4.12.patch
1d181edd11f2437ff9298f9b5e81d75f5e5db8a79a8ce2c5aed0d75882473a0b  xsa348-4.13-1.patch
b68d3dfa2003a7444c165ab3639886b9b502c06cdfd4f43bea747d8fb14dc7cd  xsa348-4.13-2.patch
67ecb0819041bf0b20a1af42970af72a15842571beb13cd0d740b0600e1aa2fd  xsa348-4.13-3.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl/YqEoMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZHysH/RUkeyzKbsafoC4gOpdTKsbCOkR6U609yR5Gpv0G
JjoeMculUV+4q4aEJVm+FoXpK2H526akTA9iZnfhxZH224/nJ/MuK8IYdCCUxAPH
GTBa64RMTcl9lwHUZUOOWNFbEwTy7CiLBh+ccAi+o8BJGBDcXYFOtD5CerD08wFI
HJ/OKa4a36q6YDbG5ESvPK+9KL7e/VM+4BUCtvrlQFMV/4zSiBh9rKLlJEa975zB
NC4dZ6ZsM/uRV8s39WQ1ihz2ylAB0Ol/uemYCMWKZRscXxolKJdoWN5F5kpygj3n
ETmwpMQSwDcG+yhIBMbJ3CnCguQzEIVyWs8Z7wPcFMZk9QQ=
=UJMI
-----END PGP SIGNATURE-----

["xsa348.meta" (application/octet-stream)]
["xsa348-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: replace reset_stack_and_jump_nolp()

Move the necessary check into check_for_livepatch_work(), rather than
mostly duplicating reset_stack_and_jump() for this purpose. This is to
prevent an inflation of reset_stack_and_jump() flavors.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Of course instead of adding the check right into
check_for_livepatch_work(), a wrapper could be introduced.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -192,7 +192,7 @@ static void noreturn continue_idle_domai
 {
     /* Idle vcpus might be attached to non-idle units! */
     if ( !is_idle_domain(v->sched_unit->domain) )
-        reset_stack_and_jump_nolp(guest_idle_loop);
+        reset_stack_and_jump(guest_idle_loop);
 
     reset_stack_and_jump(idle_loop);
 }
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1036,7 +1036,7 @@ static void noreturn svm_do_resume(struc
 
     hvm_do_resume(v);
 
-    reset_stack_and_jump_nolp(svm_asm_do_resume);
+    reset_stack_and_jump(svm_asm_do_resume);
 }
 
 void svm_vmenter_helper(const struct cpu_user_regs *regs)
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1909,7 +1909,7 @@ void vmx_do_resume(struct vcpu *v)
     if ( host_cr4 != read_cr4() )
         __vmwrite(HOST_CR4, read_cr4());
 
-    reset_stack_and_jump_nolp(vmx_asm_do_vmentry);
+    reset_stack_and_jump(vmx_asm_do_vmentry);
 }
 
 static inline unsigned long vmr(unsigned long field)
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -113,7 +113,7 @@ static int parse_pcid(const char *s)
 static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
-    reset_stack_and_jump_nolp(ret_from_intr);
+    reset_stack_and_jump(ret_from_intr);
 }
 
 static int setup_compat_l4(struct vcpu *v)
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -676,7 +676,7 @@ static void __init noreturn reinit_bsp_s
         asm volatile ("setssbsy" ::: "memory");
     }
 
-    reset_stack_and_jump_nolp(init_done);
+    reset_stack_and_jump(init_done);
 }
 
 /*
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -1635,6 +1635,11 @@ void check_for_livepatch_work(void)
     s_time_t timeout;
     unsigned long flags;
 
+    /* Only do any work when invoked in truly idle state. */
+    if ( system_state != SYS_STATE_active ||
+         !is_idle_domain(current->sched_unit->domain) )
+        return;
+
     /* Fast path: no work to do. */
     if ( !per_cpu(work_to_do, cpu ) )
         return;
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -155,13 +155,13 @@ unsigned long get_stack_dump_bottom (uns
 # define SHADOW_STACK_WORK ""
 #endif
 
-#define switch_stack_and_jump(fn, instr)                                \
+#define reset_stack_and_jump(fn)                                        \
     ({                                                                  \
         unsigned int tmp;                                               \
         __asm__ __volatile__ (                                          \
             SHADOW_STACK_WORK                                           \
             "mov %[stk], %%rsp;"                                        \
-            instr                                                       \
+            CHECK_FOR_LIVEPATCH_WORK                                    \
             "jmp %c[fun];"                                              \
             : [val] "=&r" (tmp),                                        \
               [ssp] "=&r" (tmp)                                         \
@@ -176,12 +176,6 @@ unsigned long get_stack_dump_bottom (uns
         unreachable();                                                  \
     })
 
-#define reset_stack_and_jump(fn)                                        \
-    switch_stack_and_jump(fn, CHECK_FOR_LIVEPATCH_WORK)
-
-#define reset_stack_and_jump_nolp(fn)                                   \
-    switch_stack_and_jump(fn, "")
-
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be

["xsa348-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: fold guest_idle_loop() into idle_loop()

The latter can easily be made cover both cases. This is in preparation
of using idle_loop directly for populating idle_csw.tail.

Take the liberty and also adjust indentation / spacing in involved code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -133,14 +133,22 @@ void play_dead(void)
 static void idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
+    /*
+     * Idle vcpus might be attached to non-idle units! We don't do any
+     * standard idle work like tasklets or livepatching in this case.
+     */
+    bool guest = !is_idle_domain(current->sched_unit->domain);
 
     for ( ; ; )
     {
         if ( cpu_is_offline(cpu) )
+        {
+            ASSERT(!guest);
             play_dead();
+        }
 
         /* Are we here for running vcpu context tasklets, or for idling? */
-        if ( unlikely(tasklet_work_to_do(cpu)) )
+        if ( !guest && unlikely(tasklet_work_to_do(cpu)) )
         {
             do_tasklet();
             /* Livepatch work is always kicked off via a tasklet. */
@@ -151,28 +159,14 @@ static void idle_loop(void)
          * and then, after it is done, whether softirqs became pending
          * while we were scrubbing.
          */
-        else if ( !softirq_pending(cpu) && !scrub_free_pages()  &&
-                    !softirq_pending(cpu) )
-            pm_idle();
-        do_softirq();
-    }
-}
-
-/*
- * Idle loop for siblings in active schedule units.
- * We don't do any standard idle work like tasklets or livepatching.
- */
-static void guest_idle_loop(void)
-{
-    unsigned int cpu = smp_processor_id();
-
-    for ( ; ; )
-    {
-        ASSERT(!cpu_is_offline(cpu));
-
-        if ( !softirq_pending(cpu) && !scrub_free_pages() &&
-             !softirq_pending(cpu))
-            sched_guest_idle(pm_idle, cpu);
+        else if ( !softirq_pending(cpu) && !scrub_free_pages() &&
+                  !softirq_pending(cpu) )
+        {
+            if ( guest )
+                sched_guest_idle(pm_idle, cpu);
+            else
+                pm_idle();
+        }
         do_softirq();
     }
 }
@@ -190,10 +184,6 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
-    /* Idle vcpus might be attached to non-idle units! */
-    if ( !is_idle_domain(v->sched_unit->domain) )
-        reset_stack_and_jump(guest_idle_loop);
-
     reset_stack_and_jump(idle_loop);
 }
 

["xsa348-3.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: avoid calling {svm,vmx}_do_resume()

These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.

Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.

This is XSA-348 / CVE-2020-29566.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
v2: Fix LIVEPATCH builds crashing.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -130,7 +130,7 @@ void play_dead(void)
         dead_idle();
 }
 
-static void idle_loop(void)
+static void noreturn idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
     /*
@@ -182,11 +182,6 @@ void startup_cpu_idle_loop(void)
     reset_stack_and_jump(idle_loop);
 }
 
-static void noreturn continue_idle_domain(struct vcpu *v)
-{
-    reset_stack_and_jump(idle_loop);
-}
-
 void init_hypercall_page(struct domain *d, void *ptr)
 {
     memset(ptr, 0xcc, PAGE_SIZE);
@@ -710,7 +705,7 @@ int arch_domain_create(struct domain *d,
         static const struct arch_csw idle_csw = {
             .from = paravirt_ctxt_switch_from,
             .to   = paravirt_ctxt_switch_to,
-            .tail = continue_idle_domain,
+            .tail = idle_loop,
         };
 
         d->arch.ctxt_switch = &idle_csw;
@@ -2047,20 +2042,12 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    /*
-     * Schedule tail *should* be a terminal function pointer, but leave a
-     * bug frame around just in case it returns, to save going back into the
-     * context switching code and leaving a far more subtle crash to diagnose.
-     */
-    nextd->arch.ctxt_switch->tail(next);
-    BUG();
+    reset_stack_and_jump_ind(nextd->arch.ctxt_switch->tail);
 }
 
 void continue_running(struct vcpu *same)
 {
-    /* See the comment above. */
-    same->domain->arch.ctxt_switch->tail(same);
-    BUG();
+    reset_stack_and_jump_ind(same->domain->arch.ctxt_switch->tail);
 }
 
 int __sync_local_execstate(void)
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -991,8 +991,9 @@ static void svm_ctxt_switch_to(struct vc
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
 }
 
-static void noreturn svm_do_resume(struct vcpu *v)
+static void noreturn svm_do_resume(void)
 {
+    struct vcpu *v = current;
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
     bool debug_state = (v->domain->debugger_attached ||
                         v->domain->arch.monitor.software_breakpoint_enabled ||
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1850,8 +1850,9 @@ void vmx_vmentry_failure(void)
     domain_crash(curr->domain);
 }
 
-void vmx_do_resume(struct vcpu *v)
+void vmx_do_resume(void)
 {
+    struct vcpu *v = current;
     bool_t debug_state;
     unsigned long host_cr4;
 
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -110,7 +110,7 @@ static int parse_pcid(const char *s)
     return rc;
 }
 
-static void noreturn continue_nonidle_domain(struct vcpu *v)
+static void noreturn continue_nonidle_domain(void)
 {
     check_wakeup_from_wait();
     reset_stack_and_jump(ret_from_intr);
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -155,18 +155,18 @@ unsigned long get_stack_dump_bottom (uns
 # define SHADOW_STACK_WORK ""
 #endif
 
-#define reset_stack_and_jump(fn)                                        \
+#define switch_stack_and_jump(fn, instr, constr)                        \
     ({                                                                  \
         unsigned int tmp;                                               \
         __asm__ __volatile__ (                                          \
             SHADOW_STACK_WORK                                           \
             "mov %[stk], %%rsp;"                                        \
             CHECK_FOR_LIVEPATCH_WORK                                    \
-            "jmp %c[fun];"                                              \
+            instr "[fun]"                                               \
             : [val] "=&r" (tmp),                                        \
               [ssp] "=&r" (tmp)                                         \
             : [stk] "r" (guest_cpu_user_regs()),                        \
-              [fun] "i" (fn),                                           \
+              [fun] constr (fn),                                        \
               [skstk_base] "i"                                          \
               ((PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8),               \
               [stack_mask] "i" (STACK_SIZE - 1),                        \
@@ -176,6 +176,13 @@ unsigned long get_stack_dump_bottom (uns
         unreachable();                                                  \
     })
 
+#define reset_stack_and_jump(fn)                                        \
+    switch_stack_and_jump(fn, "jmp %c", "i")
+
+/* The constraint may only specify non-call-clobbered registers. */
+#define reset_stack_and_jump_ind(fn)                                    \
+    switch_stack_and_jump(fn, "INDIRECT_JMP %", "b")
+
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -337,7 +337,7 @@ struct arch_domain
     const struct arch_csw {
         void (*from)(struct vcpu *);
         void (*to)(struct vcpu *);
-        void (*tail)(struct vcpu *);
+        void noreturn (*tail)(void);
     } *ctxt_switch;
 
 #ifdef CONFIG_HVM
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -95,7 +95,7 @@ typedef enum {
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-void noreturn vmx_do_resume(struct vcpu *);
+void noreturn vmx_do_resume(void);
 void vmx_vlapic_msr_changed(struct vcpu *v);
 struct hvm_emulate_ctxt;
 void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);

["xsa348-4.10.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: avoid calling {svm,vmx}_do_resume()

These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.

Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.

This is XSA-348 / CVE-2020-29566.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- sle15.orig/xen/arch/x86/domain.c	2020-10-15 17:35:17.000000000 +0200
+++ sle15/xen/arch/x86/domain.c	2020-11-10 17:56:59.000000000 +0100
@@ -121,7 +121,7 @@ static void play_dead(void)
     (*dead_idle)();
 }
 
-static void idle_loop(void)
+static void noreturn idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
 
@@ -161,11 +161,6 @@ void startup_cpu_idle_loop(void)
     reset_stack_and_jump(idle_loop);
 }
 
-static void noreturn continue_idle_domain(struct vcpu *v)
-{
-    reset_stack_and_jump(idle_loop);
-}
-
 void dump_pageframe_info(struct domain *d)
 {
     struct page_info *page;
@@ -560,7 +555,7 @@ int arch_domain_create(struct domain *d,
         static const struct arch_csw idle_csw = {
             .from = paravirt_ctxt_switch_from,
             .to   = paravirt_ctxt_switch_to,
-            .tail = continue_idle_domain,
+            .tail = idle_loop,
         };
 
         d->arch.ctxt_switch = &idle_csw;
@@ -1774,20 +1769,12 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    /*
-     * Schedule tail *should* be a terminal function pointer, but leave a
-     * bug frame around just in case it returns, to save going back into the
-     * context switching code and leaving a far more subtle crash to diagnose.
-     */
-    nextd->arch.ctxt_switch->tail(next);
-    BUG();
+    reset_stack_and_jump_ind(nextd->arch.ctxt_switch->tail);
 }
 
 void continue_running(struct vcpu *same)
 {
-    /* See the comment above. */
-    same->domain->arch.ctxt_switch->tail(same);
-    BUG();
+    reset_stack_and_jump_ind(same->domain->arch.ctxt_switch->tail);
 }
 
 int __sync_local_execstate(void)
--- sle15.orig/xen/arch/x86/hvm/svm/svm.c	2019-02-15 23:40:31.000000000 +0100
+++ sle15/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:56:59.000000000 +0100
@@ -1086,8 +1086,9 @@ static void svm_ctxt_switch_to(struct vc
         wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
 
-static void noreturn svm_do_resume(struct vcpu *v)
+static void noreturn svm_do_resume(void)
 {
+    struct vcpu *v = current;
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
     bool_t debug_state = v->domain->debugger_attached;
     bool_t vcpu_guestmode = 0;
--- sle15.orig/xen/arch/x86/hvm/vmx/vmcs.c	2020-05-18 00:00:00.000000000 +0200
+++ sle15/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:56:59.000000000 +0100
@@ -1785,8 +1785,9 @@ void vmx_vmentry_failure(void)
     domain_crash_synchronous();
 }
 
-void vmx_do_resume(struct vcpu *v)
+void vmx_do_resume(void)
 {
+    struct vcpu *v = current;
     bool_t debug_state;
 
     if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
--- sle15.orig/xen/arch/x86/pv/domain.c	2019-07-05 18:27:31.000000000 +0200
+++ sle15/xen/arch/x86/pv/domain.c	2020-11-10 17:56:59.000000000 +0100
@@ -64,7 +64,7 @@ custom_runtime_param("pcid", parse_pcid)
 #undef page_to_mfn
 #define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
 
-static void noreturn continue_nonidle_domain(struct vcpu *v)
+static void noreturn continue_nonidle_domain(void)
 {
     check_wakeup_from_wait();
     reset_stack_and_jump(ret_from_intr);
--- sle15.orig/xen/include/asm-x86/current.h	2019-07-05 18:27:31.000000000 +0200
+++ sle15/xen/include/asm-x86/current.h	2020-11-10 17:56:59.000000000 +0100
@@ -124,16 +124,23 @@ unsigned long get_stack_dump_bottom (uns
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
 
-#define reset_stack_and_jump(__fn)                                      \
+#define switch_stack_and_jump(fn, instr, constr)                        \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
             "mov %0,%%"__OP"sp;"                                        \
-            CHECK_FOR_LIVEPATCH_WORK                                      \
-             "jmp %c1"                                                  \
-            : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
+            CHECK_FOR_LIVEPATCH_WORK                                    \
+            instr "1"                                                   \
+            : : "r" (guest_cpu_user_regs()), constr (fn) : "memory" );  \
         unreachable();                                                  \
     })
 
+#define reset_stack_and_jump(fn)                                        \
+    switch_stack_and_jump(fn, "jmp %c", "i")
+
+/* The constraint may only specify non-call-clobbered registers. */
+#define reset_stack_and_jump_ind(fn)                                    \
+    switch_stack_and_jump(fn, "INDIRECT_JMP %", "b")
+
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
--- sle15.orig/xen/include/asm-x86/domain.h	2019-11-26 14:55:15.817547760 +0100
+++ sle15/xen/include/asm-x86/domain.h	2020-11-10 17:56:59.000000000 +0100
@@ -328,7 +328,7 @@ struct arch_domain
     const struct arch_csw {
         void (*from)(struct vcpu *);
         void (*to)(struct vcpu *);
-        void (*tail)(struct vcpu *);
+        void noreturn (*tail)(void);
     } *ctxt_switch;
 
     /* nestedhvm: translate l2 guest physical to host physical */
--- sle15.orig/xen/include/asm-x86/hvm/vmx/vmx.h	2020-05-18 00:00:00.000000000 +0200
+++ sle15/xen/include/asm-x86/hvm/vmx/vmx.h	2020-11-10 17:56:59.000000000 +0100
@@ -95,7 +95,7 @@ typedef enum {
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-void noreturn vmx_do_resume(struct vcpu *);
+void noreturn vmx_do_resume(void);
 void vmx_vlapic_msr_changed(struct vcpu *v);
 void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
 void vmx_realmode(struct cpu_user_regs *regs);

["xsa348-4.11.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: avoid calling {svm,vmx}_do_resume()

These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.

Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.

This is XSA-348 / CVE-2020-29566.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- sle12sp4.orig/xen/arch/x86/domain.c	2020-10-15 17:35:17.000000000 +0200
+++ sle12sp4/xen/arch/x86/domain.c	2020-11-10 17:56:59.000000000 +0100
@@ -121,7 +121,7 @@ static void play_dead(void)
     (*dead_idle)();
 }
 
-static void idle_loop(void)
+static void noreturn idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
 
@@ -161,11 +161,6 @@ void startup_cpu_idle_loop(void)
     reset_stack_and_jump(idle_loop);
 }
 
-static void noreturn continue_idle_domain(struct vcpu *v)
-{
-    reset_stack_and_jump(idle_loop);
-}
-
 void dump_pageframe_info(struct domain *d)
 {
     struct page_info *page;
@@ -456,7 +451,7 @@ int arch_domain_create(struct domain *d,
         static const struct arch_csw idle_csw = {
             .from = paravirt_ctxt_switch_from,
             .to   = paravirt_ctxt_switch_to,
-            .tail = continue_idle_domain,
+            .tail = idle_loop,
         };
 
         d->arch.ctxt_switch = &idle_csw;
@@ -1770,20 +1765,12 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    /*
-     * Schedule tail *should* be a terminal function pointer, but leave a
-     * bug frame around just in case it returns, to save going back into the
-     * context switching code and leaving a far more subtle crash to diagnose.
-     */
-    nextd->arch.ctxt_switch->tail(next);
-    BUG();
+    reset_stack_and_jump_ind(nextd->arch.ctxt_switch->tail);
 }
 
 void continue_running(struct vcpu *same)
 {
-    /* See the comment above. */
-    same->domain->arch.ctxt_switch->tail(same);
-    BUG();
+    reset_stack_and_jump_ind(same->domain->arch.ctxt_switch->tail);
 }
 
 int __sync_local_execstate(void)
--- sle12sp4.orig/xen/arch/x86/hvm/svm/svm.c	2020-06-18 15:13:13.001760095 +0200
+++ sle12sp4/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:56:59.000000000 +0100
@@ -1111,8 +1111,9 @@ static void svm_ctxt_switch_to(struct vc
         wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
 }
 
-static void noreturn svm_do_resume(struct vcpu *v)
+static void noreturn svm_do_resume(void)
 {
+    struct vcpu *v = current;
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
     bool debug_state = (v->domain->debugger_attached ||
                         v->domain->arch.monitor.software_breakpoint_enabled ||
--- sle12sp4.orig/xen/arch/x86/hvm/vmx/vmcs.c	2019-12-03 17:46:26.000000000 +0100
+++ sle12sp4/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:56:59.000000000 +0100
@@ -1782,8 +1782,9 @@ void vmx_vmentry_failure(void)
     domain_crash_synchronous();
 }
 
-void vmx_do_resume(struct vcpu *v)
+void vmx_do_resume(void)
 {
+    struct vcpu *v = current;
     bool_t debug_state;
     unsigned long host_cr4;
 
--- sle12sp4.orig/xen/arch/x86/pv/domain.c	2019-06-25 23:47:11.000000000 +0200
+++ sle12sp4/xen/arch/x86/pv/domain.c	2020-11-10 17:56:59.000000000 +0100
@@ -58,7 +58,7 @@ static int parse_pcid(const char *s)
 }
 custom_runtime_param("pcid", parse_pcid);
 
-static void noreturn continue_nonidle_domain(struct vcpu *v)
+static void noreturn continue_nonidle_domain(void)
 {
     check_wakeup_from_wait();
     reset_stack_and_jump(ret_from_intr);
--- sle12sp4.orig/xen/include/asm-x86/current.h	2019-06-25 23:47:11.000000000 +0200
+++ sle12sp4/xen/include/asm-x86/current.h	2020-11-10 17:56:59.000000000 +0100
@@ -124,16 +124,23 @@ unsigned long get_stack_dump_bottom (uns
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
 
-#define reset_stack_and_jump(__fn)                                      \
+#define switch_stack_and_jump(fn, instr, constr)                        \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
             "mov %0,%%"__OP"sp;"                                        \
-            CHECK_FOR_LIVEPATCH_WORK                                      \
-             "jmp %c1"                                                  \
-            : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
+            CHECK_FOR_LIVEPATCH_WORK                                    \
+            instr "1"                                                   \
+            : : "r" (guest_cpu_user_regs()), constr (fn) : "memory" );  \
         unreachable();                                                  \
     })
 
+#define reset_stack_and_jump(fn)                                        \
+    switch_stack_and_jump(fn, "jmp %c", "i")
+
+/* The constraint may only specify non-call-clobbered registers. */
+#define reset_stack_and_jump_ind(fn)                                    \
+    switch_stack_and_jump(fn, "INDIRECT_JMP %", "b")
+
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
--- sle12sp4.orig/xen/include/asm-x86/domain.h	2019-12-03 17:46:26.000000000 +0100
+++ sle12sp4/xen/include/asm-x86/domain.h	2020-11-10 17:56:59.000000000 +0100
@@ -328,7 +328,7 @@ struct arch_domain
     const struct arch_csw {
         void (*from)(struct vcpu *);
         void (*to)(struct vcpu *);
-        void (*tail)(struct vcpu *);
+        void noreturn (*tail)(void);
     } *ctxt_switch;
 
     /* nestedhvm: translate l2 guest physical to host physical */
--- sle12sp4.orig/xen/include/asm-x86/hvm/vmx/vmx.h	2019-12-03 17:46:26.000000000 +0100
+++ sle12sp4/xen/include/asm-x86/hvm/vmx/vmx.h	2020-11-10 17:56:59.000000000 +0100
@@ -95,7 +95,7 @@ typedef enum {
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-void noreturn vmx_do_resume(struct vcpu *);
+void noreturn vmx_do_resume(void);
 void vmx_vlapic_msr_changed(struct vcpu *v);
 void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
 void vmx_realmode(struct cpu_user_regs *regs);

["xsa348-4.12.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: avoid calling {svm,vmx}_do_resume()

These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.

Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.

This is XSA-348 / CVE-2020-29566.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- sle15sp1.orig/xen/arch/x86/domain.c	2020-11-02 15:54:40.000000000 +0100
+++ sle15sp1/xen/arch/x86/domain.c	2020-11-10 17:56:59.000000000 +0100
@@ -123,7 +123,7 @@ static void play_dead(void)
     (*dead_idle)();
 }
 
-static void idle_loop(void)
+static void noreturn idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
 
@@ -163,11 +163,6 @@ void startup_cpu_idle_loop(void)
     reset_stack_and_jump(idle_loop);
 }
 
-static void noreturn continue_idle_domain(struct vcpu *v)
-{
-    reset_stack_and_jump(idle_loop);
-}
-
 void dump_pageframe_info(struct domain *d)
 {
     struct page_info *page;
@@ -484,7 +479,7 @@ int arch_domain_create(struct domain *d,
         static const struct arch_csw idle_csw = {
             .from = paravirt_ctxt_switch_from,
             .to   = paravirt_ctxt_switch_to,
-            .tail = continue_idle_domain,
+            .tail = idle_loop,
         };
 
         d->arch.ctxt_switch = &idle_csw;
@@ -1784,20 +1779,12 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    /*
-     * Schedule tail *should* be a terminal function pointer, but leave a
-     * bug frame around just in case it returns, to save going back into the
-     * context switching code and leaving a far more subtle crash to diagnose.
-     */
-    nextd->arch.ctxt_switch->tail(next);
-    BUG();
+    reset_stack_and_jump_ind(nextd->arch.ctxt_switch->tail);
 }
 
 void continue_running(struct vcpu *same)
 {
-    /* See the comment above. */
-    same->domain->arch.ctxt_switch->tail(same);
-    BUG();
+    reset_stack_and_jump_ind(same->domain->arch.ctxt_switch->tail);
 }
 
 int __sync_local_execstate(void)
--- sle15sp1.orig/xen/arch/x86/hvm/svm/svm.c	2020-11-02 15:54:40.000000000 +0100
+++ sle15sp1/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:56:59.000000000 +0100
@@ -1055,8 +1055,9 @@ static void svm_ctxt_switch_to(struct vc
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
 }
 
-static void noreturn svm_do_resume(struct vcpu *v)
+static void noreturn svm_do_resume(void)
 {
+    struct vcpu *v = current;
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
     bool debug_state = (v->domain->debugger_attached ||
                         v->domain->arch.monitor.software_breakpoint_enabled ||
--- sle15sp1.orig/xen/arch/x86/hvm/vmx/vmcs.c	2020-01-06 19:03:09.000000000 +0100
+++ sle15sp1/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:56:59.000000000 +0100
@@ -1830,8 +1830,9 @@ void vmx_vmentry_failure(void)
     domain_crash(curr->domain);
 }
 
-void vmx_do_resume(struct vcpu *v)
+void vmx_do_resume(void)
 {
+    struct vcpu *v = current;
     bool_t debug_state;
     unsigned long host_cr4;
 
--- sle15sp1.orig/xen/arch/x86/pv/domain.c	2020-11-02 15:54:40.000000000 +0100
+++ sle15sp1/xen/arch/x86/pv/domain.c	2020-11-10 17:56:59.000000000 +0100
@@ -58,7 +58,7 @@ static int parse_pcid(const char *s)
 }
 custom_runtime_param("pcid", parse_pcid);
 
-static void noreturn continue_nonidle_domain(struct vcpu *v)
+static void noreturn continue_nonidle_domain(void)
 {
     check_wakeup_from_wait();
     reset_stack_and_jump(ret_from_intr);
--- sle15sp1.orig/xen/include/asm-x86/current.h	2019-08-09 17:51:20.000000000 +0200
+++ sle15sp1/xen/include/asm-x86/current.h	2020-11-10 17:56:59.000000000 +0100
@@ -124,16 +124,23 @@ unsigned long get_stack_dump_bottom (uns
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
 
-#define reset_stack_and_jump(__fn)                                      \
+#define switch_stack_and_jump(fn, instr, constr)                        \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
             "mov %0,%%"__OP"sp;"                                        \
-            CHECK_FOR_LIVEPATCH_WORK                                      \
-             "jmp %c1"                                                  \
-            : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
+            CHECK_FOR_LIVEPATCH_WORK                                    \
+            instr "1"                                                   \
+            : : "r" (guest_cpu_user_regs()), constr (fn) : "memory" );  \
         unreachable();                                                  \
     })
 
+#define reset_stack_and_jump(fn)                                        \
+    switch_stack_and_jump(fn, "jmp %c", "i")
+
+/* The constraint may only specify non-call-clobbered registers. */
+#define reset_stack_and_jump_ind(fn)                                    \
+    switch_stack_and_jump(fn, "INDIRECT_JMP %", "b")
+
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
--- sle15sp1.orig/xen/include/asm-x86/domain.h	2020-11-02 15:54:40.000000000 +0100
+++ sle15sp1/xen/include/asm-x86/domain.h	2020-11-10 17:56:59.000000000 +0100
@@ -318,7 +318,7 @@ struct arch_domain
     const struct arch_csw {
         void (*from)(struct vcpu *);
         void (*to)(struct vcpu *);
-        void (*tail)(struct vcpu *);
+        void noreturn (*tail)(void);
     } *ctxt_switch;
 
 #ifdef CONFIG_HVM
--- sle15sp1.orig/xen/include/asm-x86/hvm/vmx/vmx.h	2020-01-06 19:03:09.000000000 +0100
+++ sle15sp1/xen/include/asm-x86/hvm/vmx/vmx.h	2020-11-10 17:56:59.000000000 +0100
@@ -95,7 +95,7 @@ typedef enum {
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-void noreturn vmx_do_resume(struct vcpu *);
+void noreturn vmx_do_resume(void);
 void vmx_vlapic_msr_changed(struct vcpu *v);
 void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
 void vmx_realmode(struct cpu_user_regs *regs);

["xsa348-4.13-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: replace reset_stack_and_jump_nolp()

Move the necessary check into check_for_livepatch_work(), rather than
mostly duplicating reset_stack_and_jump() for this purpose. This is to
prevent an inflation of reset_stack_and_jump() flavors.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- sle15sp2.orig/xen/arch/x86/domain.c	2020-10-30 17:22:39.000000000 +0100
+++ sle15sp2/xen/arch/x86/domain.c	2020-11-10 17:51:10.894525721 +0100
@@ -192,7 +192,7 @@ static void noreturn continue_idle_domai
 {
     /* Idle vcpus might be attached to non-idle units! */
     if ( !is_idle_domain(v->sched_unit->domain) )
-        reset_stack_and_jump_nolp(guest_idle_loop);
+        reset_stack_and_jump(guest_idle_loop);
 
     reset_stack_and_jump(idle_loop);
 }
--- sle15sp2.orig/xen/arch/x86/hvm/svm/svm.c	2020-10-30 17:22:39.000000000 +0100
+++ sle15sp2/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:51:10.898525723 +0100
@@ -1032,7 +1032,7 @@ static void noreturn svm_do_resume(struc
 
     hvm_do_resume(v);
 
-    reset_stack_and_jump_nolp(svm_asm_do_resume);
+    reset_stack_and_jump(svm_asm_do_resume);
 }
 
 void svm_vmenter_helper(const struct cpu_user_regs *regs)
--- sle15sp2.orig/xen/arch/x86/hvm/vmx/vmcs.c	2020-05-18 18:53:09.000000000 +0200
+++ sle15sp2/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:51:10.898525723 +0100
@@ -1889,7 +1889,7 @@ void vmx_do_resume(struct vcpu *v)
     if ( host_cr4 != read_cr4() )
         __vmwrite(HOST_CR4, read_cr4());
 
-    reset_stack_and_jump_nolp(vmx_asm_do_vmentry);
+    reset_stack_and_jump(vmx_asm_do_vmentry);
 }
 
 static inline unsigned long vmr(unsigned long field)
--- sle15sp2.orig/xen/arch/x86/pv/domain.c	2020-10-30 17:22:39.000000000 +0100
+++ sle15sp2/xen/arch/x86/pv/domain.c	2020-11-10 17:51:10.898525723 +0100
@@ -61,7 +61,7 @@ custom_runtime_param("pcid", parse_pcid)
 static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
-    reset_stack_and_jump_nolp(ret_from_intr);
+    reset_stack_and_jump(ret_from_intr);
 }
 
 static int setup_compat_l4(struct vcpu *v)
--- sle15sp2.orig/xen/arch/x86/setup.c	2020-05-18 18:53:09.000000000 +0200
+++ sle15sp2/xen/arch/x86/setup.c	2020-11-10 17:51:10.898525723 +0100
@@ -631,7 +631,7 @@ static void __init noreturn reinit_bsp_s
     stack_base[0] = stack;
     memguard_guard_stack(stack);
 
-    reset_stack_and_jump_nolp(init_done);
+    reset_stack_and_jump(init_done);
 }
 
 /*
--- sle15sp2.orig/xen/common/livepatch.c	2020-05-18 18:53:09.000000000 +0200
+++ sle15sp2/xen/common/livepatch.c	2020-11-10 17:51:10.898525723 +0100
@@ -1300,6 +1300,11 @@ void check_for_livepatch_work(void)
     s_time_t timeout;
     unsigned long flags;
 
+    /* Only do any work when invoked in truly idle state. */
+    if ( system_state != SYS_STATE_active ||
+         !is_idle_domain(current->sched_unit->domain) )
+        return;
+
     /* Fast path: no work to do. */
     if ( !per_cpu(work_to_do, cpu ) )
         return;
--- sle15sp2.orig/xen/include/asm-x86/current.h	2019-12-18 16:18:59.000000000 +0100
+++ sle15sp2/xen/include/asm-x86/current.h	2020-11-10 17:51:10.902525725 +0100
@@ -129,22 +129,16 @@ unsigned long get_stack_dump_bottom (uns
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
 
-#define switch_stack_and_jump(fn, instr)                                \
+#define reset_stack_and_jump(fn)                                        \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
             "mov %0,%%"__OP"sp;"                                        \
-            instr                                                       \
+            CHECK_FOR_LIVEPATCH_WORK                                    \
              "jmp %c1"                                                  \
             : : "r" (guest_cpu_user_regs()), "i" (fn) : "memory" );     \
         unreachable();                                                  \
     })
 
-#define reset_stack_and_jump(fn)                                        \
-    switch_stack_and_jump(fn, CHECK_FOR_LIVEPATCH_WORK)
-
-#define reset_stack_and_jump_nolp(fn)                                   \
-    switch_stack_and_jump(fn, "")
-
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be

["xsa348-4.13-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: fold guest_idle_loop() into idle_loop()

The latter can easily be made cover both cases. This is in preparation
of using idle_loop directly for populating idle_csw.tail.

Take the liberty and also adjust indentation / spacing in involved code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- sle15sp2.orig/xen/arch/x86/domain.c	2020-11-10 17:51:10.894525721 +0100
+++ sle15sp2/xen/arch/x86/domain.c	2020-11-10 17:51:46.354546349 +0100
@@ -133,14 +133,22 @@ void play_dead(void)
 static void idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
+    /*
+     * Idle vcpus might be attached to non-idle units! We don't do any
+     * standard idle work like tasklets or livepatching in this case.
+     */
+    bool guest = !is_idle_domain(current->sched_unit->domain);
 
     for ( ; ; )
     {
         if ( cpu_is_offline(cpu) )
+        {
+            ASSERT(!guest);
             play_dead();
+        }
 
         /* Are we here for running vcpu context tasklets, or for idling? */
-        if ( unlikely(tasklet_work_to_do(cpu)) )
+        if ( !guest && unlikely(tasklet_work_to_do(cpu)) )
         {
             do_tasklet();
             /* Livepatch work is always kicked off via a tasklet. */
@@ -151,28 +159,14 @@ static void idle_loop(void)
          * and then, after it is done, whether softirqs became pending
          * while we were scrubbing.
          */
-        else if ( !softirq_pending(cpu) && !scrub_free_pages()  &&
-                    !softirq_pending(cpu) )
-            pm_idle();
-        do_softirq();
-    }
-}
-
-/*
- * Idle loop for siblings in active schedule units.
- * We don't do any standard idle work like tasklets or livepatching.
- */
-static void guest_idle_loop(void)
-{
-    unsigned int cpu = smp_processor_id();
-
-    for ( ; ; )
-    {
-        ASSERT(!cpu_is_offline(cpu));
-
-        if ( !softirq_pending(cpu) && !scrub_free_pages() &&
-             !softirq_pending(cpu))
-            sched_guest_idle(pm_idle, cpu);
+        else if ( !softirq_pending(cpu) && !scrub_free_pages() &&
+                  !softirq_pending(cpu) )
+        {
+            if ( guest )
+                sched_guest_idle(pm_idle, cpu);
+            else
+                pm_idle();
+        }
         do_softirq();
     }
 }
@@ -190,10 +184,6 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
-    /* Idle vcpus might be attached to non-idle units! */
-    if ( !is_idle_domain(v->sched_unit->domain) )
-        reset_stack_and_jump(guest_idle_loop);
-
     reset_stack_and_jump(idle_loop);
 }
 

["xsa348-4.13-3.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86: avoid calling {svm,vmx}_do_resume()

These functions follow the following path: hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() ->
wait_on_xen_event_channel() -> do_softirq() -> schedule() ->
sched_context_switch() -> continue_running() and hence may
recursively invoke themselves. If this ends up happening a couple of
times, a stack overflow would result.

Prevent this by also resetting the stack at the
->arch.ctxt_switch->tail() invocations (in both places for consistency)
and thus jumping to the functions instead of calling them.

This is XSA-348 / CVE-2020-29566.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- sle15sp2.orig/xen/arch/x86/domain.c	2020-11-10 17:51:46.354546349 +0100
+++ sle15sp2/xen/arch/x86/domain.c	2020-11-10 17:56:58.758730088 +0100
@@ -130,7 +130,7 @@ void play_dead(void)
         dead_idle();
 }
 
-static void idle_loop(void)
+static void noreturn idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
     /*
@@ -182,11 +182,6 @@ void startup_cpu_idle_loop(void)
     reset_stack_and_jump(idle_loop);
 }
 
-static void noreturn continue_idle_domain(struct vcpu *v)
-{
-    reset_stack_and_jump(idle_loop);
-}
-
 void init_hypercall_page(struct domain *d, void *ptr)
 {
     memset(ptr, 0xcc, PAGE_SIZE);
@@ -535,7 +530,7 @@ int arch_domain_create(struct domain *d,
         static const struct arch_csw idle_csw = {
             .from = paravirt_ctxt_switch_from,
             .to   = paravirt_ctxt_switch_to,
-            .tail = continue_idle_domain,
+            .tail = idle_loop,
         };
 
         d->arch.ctxt_switch = &idle_csw;
@@ -1833,20 +1828,12 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    /*
-     * Schedule tail *should* be a terminal function pointer, but leave a
-     * bug frame around just in case it returns, to save going back into the
-     * context switching code and leaving a far more subtle crash to diagnose.
-     */
-    nextd->arch.ctxt_switch->tail(next);
-    BUG();
+    reset_stack_and_jump_ind(nextd->arch.ctxt_switch->tail);
 }
 
 void continue_running(struct vcpu *same)
 {
-    /* See the comment above. */
-    same->domain->arch.ctxt_switch->tail(same);
-    BUG();
+    reset_stack_and_jump_ind(same->domain->arch.ctxt_switch->tail);
 }
 
 int __sync_local_execstate(void)
--- sle15sp2.orig/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:51:10.898525723 +0100
+++ sle15sp2/xen/arch/x86/hvm/svm/svm.c	2020-11-10 17:56:58.762730090 +0100
@@ -987,8 +987,9 @@ static void svm_ctxt_switch_to(struct vc
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
 }
 
-static void noreturn svm_do_resume(struct vcpu *v)
+static void noreturn svm_do_resume(void)
 {
+    struct vcpu *v = current;
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
     bool debug_state = (v->domain->debugger_attached ||
                         v->domain->arch.monitor.software_breakpoint_enabled ||
--- sle15sp2.orig/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:51:10.898525723 +0100
+++ sle15sp2/xen/arch/x86/hvm/vmx/vmcs.c	2020-11-10 17:56:58.762730090 +0100
@@ -1830,8 +1830,9 @@ void vmx_vmentry_failure(void)
     domain_crash(curr->domain);
 }
 
-void vmx_do_resume(struct vcpu *v)
+void vmx_do_resume(void)
 {
+    struct vcpu *v = current;
     bool_t debug_state;
     unsigned long host_cr4;
 
--- sle15sp2.orig/xen/arch/x86/pv/domain.c	2020-11-10 17:51:10.898525723 +0100
+++ sle15sp2/xen/arch/x86/pv/domain.c	2020-11-10 17:56:58.762730090 +0100
@@ -58,7 +58,7 @@ static int parse_pcid(const char *s)
 }
 custom_runtime_param("pcid", parse_pcid);
 
-static void noreturn continue_nonidle_domain(struct vcpu *v)
+static void noreturn continue_nonidle_domain(void)
 {
     check_wakeup_from_wait();
     reset_stack_and_jump(ret_from_intr);
--- sle15sp2.orig/xen/include/asm-x86/current.h	2020-11-10 17:51:10.902525725 +0100
+++ sle15sp2/xen/include/asm-x86/current.h	2020-11-10 17:56:58.762730090 +0100
@@ -129,16 +129,23 @@ unsigned long get_stack_dump_bottom (uns
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
 
-#define reset_stack_and_jump(fn)                                        \
+#define switch_stack_and_jump(fn, instr, constr)                        \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
             "mov %0,%%"__OP"sp;"                                        \
             CHECK_FOR_LIVEPATCH_WORK                                    \
-             "jmp %c1"                                                  \
-            : : "r" (guest_cpu_user_regs()), "i" (fn) : "memory" );     \
+            instr "1"                                                   \
+            : : "r" (guest_cpu_user_regs()), constr (fn) : "memory" );  \
         unreachable();                                                  \
     })
 
+#define reset_stack_and_jump(fn)                                        \
+    switch_stack_and_jump(fn, "jmp %c", "i")
+
+/* The constraint may only specify non-call-clobbered registers. */
+#define reset_stack_and_jump_ind(fn)                                    \
+    switch_stack_and_jump(fn, "INDIRECT_JMP %", "b")
+
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
--- sle15sp2.orig/xen/include/asm-x86/domain.h	2020-10-30 17:22:39.000000000 +0100
+++ sle15sp2/xen/include/asm-x86/domain.h	2020-11-10 17:56:58.762730090 +0100
@@ -313,7 +313,7 @@ struct arch_domain
     const struct arch_csw {
         void (*from)(struct vcpu *);
         void (*to)(struct vcpu *);
-        void (*tail)(struct vcpu *);
+        void noreturn (*tail)(void);
     } *ctxt_switch;
 
 #ifdef CONFIG_HVM
--- sle15sp2.orig/xen/include/asm-x86/hvm/vmx/vmx.h	2019-12-18 16:18:59.000000000 +0100
+++ sle15sp2/xen/include/asm-x86/hvm/vmx/vmx.h	2020-11-10 17:56:58.762730090 +0100
@@ -95,7 +95,7 @@ typedef enum {
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-void noreturn vmx_do_resume(struct vcpu *);
+void noreturn vmx_do_resume(void);
 void vmx_vlapic_msr_changed(struct vcpu *v);
 void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt);
 void vmx_realmode(struct cpu_user_regs *regs);


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic