[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] Xen Security Advisory 280 v2 - Fix for XSA-240 conflicts with shadow paging
From:       Xen.org security team <security () xen ! org>
Date:       2018-11-20 13:30:49
Message-ID: E1gP677-00025q-7n () xenbits ! xenproject ! org
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

                    Xen Security Advisory XSA-280
                              version 2

              Fix for XSA-240 conflicts with shadow paging

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

The fix for XSA-240 introduced a new field into the control structure
associated with each page of RAM.  This field was added to a union,
another member of which is used when Xen uses shadow paging for the
guest.  During migration, or with the L1TF (XSA-273) mitigation for
PV guests in effect, the two uses conflict.

IMPACT
======

A malicious or buggy x86 PV guest may cause Xen to crash, resulting in
a DoS (Denial of Service) affecting the entire host.  Privilege
escalation as well as information leaks cannot be ruled out.

VULNERABLE SYSTEMS
==================

All Xen versions from at least 3.2 onwards are vulnerable.  Earlier
versions have not been checked.

Only x86 systems are affected.  ARM systems are not affected.

Only Xen versions with the XSA-240 fixes applied are vulnerable.

Only Xen versions which permit linear page table use by PV guests are
vulnerable.

Only x86 PV guests can leverage this vulnerability.  x86 HVM guests
cannot leverage this vulnerability.

MITIGATION
==========

Not permitting linear page table use by PV guests avoids the
vulnerability.  This can be done both at build time, by turning off the
PV_LINEAR_PT configure option, or at runtime, by passing specifying
"pv-linear-pt=0" on the hypervisor command line.

On systems where the guest kernel is controlled by the host rather than
guest administrator, running only kernels which have themselves been
hardened against L1TF _and_ avoiding live migrating or snapshotting PV
guests will generally prevent this issue being triggered.  However
untrusted guest administrators can still trigger it unless further
steps are taken to prevent them from loading code into the kernel
(e.g. by disabling loadable modules etc) or from using other
mechanisms which allow them to run code at kernel privilege.

Running only HVM guests will avoid this vulnerability.

CREDITS
=======

This issue was discovered by the security team of Prgmr.com.

RESOLUTION
==========

Applying the appropriate pair of attached patches resolves this issue.

xsa280-?.patch                                xen-unstable
xsa280-1.patch + xsa280-4.11-2.patch          Xen 4.11.x
xsa280-1.patch + xsa280-4.10-2.patch          Xen 4.10.x
xsa280-4.9-1.patch + xsa280-4.10-2.patch      Xen 4.9.x, Xen 4.8.x
xsa280-4.9-1.patch + xsa280-4.7-2.patch       Xen 4.7.x

$ sha256sum xsa280*
ff0b376b9e2ec16f7c15b144d4d38375d6f6b4019aa9c17f6b80f9dfe40319ef  xsa280.meta
41b2b91dbabbf2048c790c5934ab696ef53932ff98d1069eb7c7ae52e61cd44b  xsa280-1.patch
d46e46a6e706e0d3416d40ed12227223f7e8f825dfc63ed203c1df115976e8a1  xsa280-2.patch
163eaf2e16d5cc314a81fa1254eb2809674001b2329c41556a078b7f94e72ced  xsa280-4.7-2.patch
22e9d29f316356341db40c743ca59f9bb9d783a58fb6429d5badf57a77b5f34a  xsa280-4.9-1.patch
ff0a839dbd9347ec88aaeb7ef1145d0cd9029a19c6a478088c63c0959ba0e740  xsa280-4.10-2.patch
87940f3b84d0adfd89e1b2bc1a872ae2948e1621e4994e7879b77e327b0136b5  xsa280-4.11-2.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) EXCEPT the linear page table
disabling one is permitted during the embargo, even on public-facing
systems with untrusted guest users and administrators.

However deployment of the linear page table disabling mitigation is NOT
PERMITTED (except where all the affected systems and VMs are
administered and used only by organisations which are members of the
Xen Project Security Issues Predisclosure List).  Specifically,
deployment on public cloud systems is NOT permitted.

This is because altering the set of features usable in a guest in
connection with a security issue would be a user-visible change which
could lead to the rediscovery of the vulnerability.

Also: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAlv0DEsMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZnkQH/iyCga79/YRwqCHB5nrTlQhY0g6E5zA2debKtfxS
MPosJQZy7/PzkvbBPnHBYEve8UyvQuVQXs+WOhCL7625HbadgrUOD3LJzbhmduI0
AT5lbLTmM5ac9iBeLQeqkERDJOi8RSx4AtH5NhVvnSWFD/KXQvB1zow1bOIS5drz
5YMr4nA1xX0mmzx//bWRHiUbi72dvrWAeFEPj5wcxNlsGnTqTSyTvMehlJevMfC2
Rthft7e7WZQWy5z5TdbErJbDNuS9beiEvTkuO6oC3QVo5CIXDsuwCk20Q5T5Z9gg
SkoyXO1OO+MIeBpBzrIRvJrrtFpfR7s8weKcrKM8GukyMsM=
=drCg
-----END PGP SIGNATURE-----

["xsa280.meta" (application/octet-stream)]
["xsa280-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: move OOS flag bit positions

In preparation of reducing struct page_info's shadow_flags field to 16
bits, lower the bit positions used for SHF_out_of_sync and
SHF_oos_may_write.

Instead of also adjusting the open coded use in _get_page_type(),
introduce shadow_prepare_page_type_change() to contain knowledge of the
bit positions to shadow code.

This is part of XSA-280.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
v2: Rename function and pass full type.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2712,17 +2712,8 @@ static int _get_page_type(struct page_in
         {
             struct domain *d = page_get_owner(page);
 
-            /*
-             * Normally we should never let a page go from type count 0
-             * to type count 1 when it is shadowed. One exception:
-             * out-of-sync shadowed pages are allowed to become
-             * writeable.
-             */
-            if ( d && shadow_mode_enabled(d)
-                 && (page->count_info & PGC_page_table)
-                 && !((page->shadow_flags & (1u<<29))
-                      && type == PGT_writable_page) )
-               shadow_remove_all_shadows(d, page_to_mfn(page));
+            if ( d && shadow_mode_enabled(d) )
+               shadow_prepare_page_type_change(d, page, type);
 
             ASSERT(!(x & PGT_pae_xen_l2));
             if ( (x & PGT_type_mask) != type )
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -749,6 +749,9 @@ int sh_unsync(struct vcpu *v, mfn_t gmfn
          || !v->domain->arch.paging.shadow.oos_active )
         return 0;
 
+    BUILD_BUG_ON(!(typeof(pg->shadow_flags))SHF_out_of_sync);
+    BUILD_BUG_ON(!(typeof(pg->shadow_flags))SHF_oos_may_write);
+
     pg->shadow_flags |= SHF_out_of_sync|SHF_oos_may_write;
     oos_hash_add(v, gmfn);
     perfc_incr(shadow_unsync);
@@ -2413,6 +2416,26 @@ void sh_remove_shadows(struct domain *d,
     paging_unlock(d);
 }
 
+void shadow_prepare_page_type_change(struct domain *d, struct page_info *page,
+                                     unsigned long new_type)
+{
+    if ( !(page->count_info & PGC_page_table) )
+        return;
+
+#if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
+    /*
+     * Normally we should never let a page go from type count 0 to type
+     * count 1 when it is shadowed. One exception: out-of-sync shadowed
+     * pages are allowed to become writeable.
+     */
+    if ( (page->shadow_flags & SHF_oos_may_write) &&
+         new_type == PGT_writable_page )
+        return;
+#endif
+
+    shadow_remove_all_shadows(d, page_to_mfn(page));
+}
+
 static void
 sh_remove_all_shadows_and_parents(struct domain *d, mfn_t gmfn)
 /* Even harsher: this is a HVM page that we thing is no longer a pagetable.
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -285,8 +285,8 @@ static inline void sh_terminate_list(str
  * codepath is called during that time and is sensitive to oos issues, it may
  * need to use the second flag.
  */
-#define SHF_out_of_sync (1u<<30)
-#define SHF_oos_may_write (1u<<29)
+#define SHF_out_of_sync (1u << (SH_type_max_shadow + 1))
+#define SHF_oos_may_write (1u << (SH_type_max_shadow + 2))
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -81,6 +81,10 @@ void shadow_final_teardown(struct domain
 
 void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all);
 
+/* Adjust shadows ready for a guest page to change its type. */
+void shadow_prepare_page_type_change(struct domain *d, struct page_info *page,
+                                     unsigned long new_type);
+
 /* Discard _all_ mappings from the domain's shadows. */
 void shadow_blow_tables_per_domain(struct domain *d);
 
@@ -105,6 +109,10 @@ int shadow_set_allocation(struct domain
 static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
                                      int fast, int all) {}
 
+static inline void shadow_prepare_page_type_change(struct domain *d,
+                                                   struct page_info *page,
+                                                   unsigned long new_type) {}
+
 static inline void shadow_blow_tables_per_domain(struct domain *d) {}
 
 static inline int shadow_domctl(struct domain *d,

["xsa280-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: shrink struct page_info's shadow_flags to 16 bits

This is to avoid it overlapping the linear_pt_count field needed for PV
domains. Introduce a separate, HVM-only pagetable_dying field to replace
the sole one left in the upper 16 bits.

Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
are not allowed on uint16_t fields and hence their use would have
required ugly casts. This is fine because all updates of the field ought
to occur with the paging lock held, and other updates of it use |= and
&= as well (i.e. using atomic operations here didn't really guard
against potentially racing updates elsewhere).

This is part of XSA-280.

Reported-by: Prgmr.com Security <security@prgmr.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
v2: Use non-atomic, non-bitops accesses to ->shadow_flags in
    shadow_{pro,de}mote().

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -787,10 +787,14 @@ void shadow_promote(struct domain *d, mf
 
     /* Is the page already shadowed? */
     if ( !test_and_set_bit(_PGC_page_table, &page->count_info) )
+    {
         page->shadow_flags = 0;
+        if ( is_hvm_domain(d) )
+            page->pagetable_dying = false;
+    }
 
-    ASSERT(!test_bit(type, &page->shadow_flags));
-    set_bit(type, &page->shadow_flags);
+    ASSERT(!(page->shadow_flags & (1u << type)));
+    page->shadow_flags |= 1u << type;
     TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PROMOTE);
 }
 
@@ -799,9 +803,9 @@ void shadow_demote(struct domain *d, mfn
     struct page_info *page = mfn_to_page(gmfn);
 
     ASSERT(test_bit(_PGC_page_table, &page->count_info));
-    ASSERT(test_bit(type, &page->shadow_flags));
+    ASSERT(page->shadow_flags & (1u << type));
 
-    clear_bit(type, &page->shadow_flags);
+    page->shadow_flags &= ~(1u << type);
 
     if ( (page->shadow_flags & SHF_page_type_mask) == 0 )
     {
@@ -2405,7 +2409,7 @@ void sh_remove_shadows(struct domain *d,
     if ( !fast && all && (pg->count_info & PGC_page_table) )
     {
         printk(XENLOG_G_ERR "can't find all shadows of mfn %"PRI_mfn
-               " (shadow_flags=%08x)\n", mfn_x(gmfn), pg->shadow_flags);
+               " (shadow_flags=%04x)\n", mfn_x(gmfn), pg->shadow_flags);
         domain_crash(d);
     }
 
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3298,8 +3298,8 @@ static int sh_page_fault(struct vcpu *v,
 
     /* Unshadow if we are writing to a toplevel pagetable that is
      * flagged as a dying process, and that is not currently used. */
-    if ( sh_mfn_is_a_page_table(gmfn)
-         && (mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying) )
+    if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&
+         mfn_to_page(gmfn)->pagetable_dying )
     {
         int used = 0;
         struct vcpu *tmp;
@@ -4261,9 +4261,9 @@ int sh_rm_write_access_from_sl1p(struct
     ASSERT(mfn_valid(smfn));
 
     /* Remember if we've been told that this process is being torn down */
-    if ( curr->domain == d )
+    if ( curr->domain == d && is_hvm_domain(d) )
         curr->arch.paging.shadow.pagetable_dying
-            = !!(mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying);
+            = mfn_to_page(gmfn)->pagetable_dying;
 
     sp = mfn_to_page(smfn);
 
@@ -4580,10 +4580,10 @@ static void sh_pagetable_dying(paddr_t g
                    : shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
         }
 
-        if ( mfn_valid(smfn) )
+        if ( mfn_valid(smfn) && is_hvm_domain(d) )
         {
             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
-            mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+            mfn_to_page(gmfn)->pagetable_dying = true;
             shadow_unhook_mappings(d, smfn, 1/* user pages only */);
             flush = 1;
         }
@@ -4621,9 +4621,9 @@ static void sh_pagetable_dying(paddr_t g
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
 #endif
 
-    if ( mfn_valid(smfn) )
+    if ( mfn_valid(smfn) && is_hvm_domain(d) )
     {
-        mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+        mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
         flush_tlb_mask(d->dirty_cpumask);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -290,8 +290,6 @@ static inline void sh_terminate_list(str
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
-#define SHF_pagetable_dying (1u<<31)
-
 static inline int sh_page_has_multiple_shadows(struct page_info *pg)
 {
     u32 shadows;
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -259,8 +259,15 @@ struct page_info
          * Guest pages with a shadow.  This does not conflict with
          * tlbflush_timestamp since page table pages are explicitly not
          * tracked for TLB-flush avoidance when a guest runs in shadow mode.
+         *
+         * pagetable_dying is used for HVM domains only. The layout here has
+         * to avoid re-use of the space used by linear_pt_count, which (only)
+         * PV guests use.
          */
-        u32 shadow_flags;
+        struct {
+            uint16_t shadow_flags;
+            bool pagetable_dying;
+        };
 
         /* When in use as a shadow, next shadow in this hash chain. */
         __pdx_t next_shadow;

["xsa280-4.7-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: shrink struct page_info's shadow_flags to 16 bits

This is to avoid it overlapping the linear_pt_count field needed for PV
domains. Introduce a separate, HVM-only pagetable_dying field to replace
the sole one left in the upper 16 bits.

Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
are not allowed on uint16_t fields and hence their use would have
required ugly casts. This is fine because all updates of the field ought
to occur with the paging lock held, and other updates of it use |= and
&= as well (i.e. using atomic operations here didn't really guard
against potentially racing updates elsewhere).

This is part of XSA-280.

Reported-by: Prgmr.com Security <security@prgmr.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1027,10 +1027,14 @@ void shadow_promote(struct domain *d, mf
 
     /* Is the page already shadowed? */
     if ( !test_and_set_bit(_PGC_page_table, &page->count_info) )
+    {
         page->shadow_flags = 0;
+        if ( is_hvm_domain(d) )
+            page->pagetable_dying = 0;
+    }
 
-    ASSERT(!test_bit(type, &page->shadow_flags));
-    set_bit(type, &page->shadow_flags);
+    ASSERT(!(page->shadow_flags & (1u << type)));
+    page->shadow_flags |= 1u << type;
     TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PROMOTE);
 }
 
@@ -1039,9 +1043,9 @@ void shadow_demote(struct domain *d, mfn
     struct page_info *page = mfn_to_page(gmfn);
 
     ASSERT(test_bit(_PGC_page_table, &page->count_info));
-    ASSERT(test_bit(type, &page->shadow_flags));
+    ASSERT(page->shadow_flags & (1u << type));
 
-    clear_bit(type, &page->shadow_flags);
+    page->shadow_flags &= ~(1u << type);
 
     if ( (page->shadow_flags & SHF_page_type_mask) == 0 )
     {
@@ -2879,7 +2883,7 @@ void sh_remove_shadows(struct domain *d,
     if ( !fast && all && (pg->count_info & PGC_page_table) )
     {
         SHADOW_ERROR("can't find all shadows of mfn %05lx "
-                     "(shadow_flags=%08x)\n",
+                     "(shadow_flags=%04x)\n",
                       mfn_x(gmfn), pg->shadow_flags);
         domain_crash(d);
     }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3314,8 +3314,8 @@ static int sh_page_fault(struct vcpu *v,
 
     /* Unshadow if we are writing to a toplevel pagetable that is
      * flagged as a dying process, and that is not currently used. */
-    if ( sh_mfn_is_a_page_table(gmfn)
-         && (mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying) )
+    if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&
+         mfn_to_page(gmfn)->pagetable_dying )
     {
         int used = 0;
         struct vcpu *tmp;
@@ -4256,9 +4256,9 @@ int sh_rm_write_access_from_sl1p(struct
     ASSERT(mfn_valid(smfn));
 
     /* Remember if we've been told that this process is being torn down */
-    if ( curr->domain == d )
+    if ( curr->domain == d && is_hvm_domain(d) )
         curr->arch.paging.shadow.pagetable_dying
-            = !!(mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying);
+            = mfn_to_page(gmfn)->pagetable_dying;
 
     sp = mfn_to_page(smfn);
 
@@ -4575,10 +4575,10 @@ static void sh_pagetable_dying(struct vc
             smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
         }
 
-        if ( mfn_valid(smfn) )
+        if ( mfn_valid(smfn) && is_hvm_domain(d) )
         {
             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
-            mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+            mfn_to_page(gmfn)->pagetable_dying = 1;
             shadow_unhook_mappings(d, smfn, 1/* user pages only */);
             flush = 1;
         }
@@ -4615,9 +4615,9 @@ static void sh_pagetable_dying(struct vc
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
 #endif
 
-    if ( mfn_valid(smfn) )
+    if ( mfn_valid(smfn) && is_hvm_domain(d) )
     {
-        mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+        mfn_to_page(gmfn)->pagetable_dying = 1;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
         flush_tlb_mask(d->domain_dirty_cpumask);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -292,8 +292,6 @@ static inline void sh_terminate_list(str
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
-#define SHF_pagetable_dying (1u<<31)
-
 static inline int sh_page_has_multiple_shadows(struct page_info *pg)
 {
     u32 shadows;
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -172,8 +172,15 @@ struct page_info
          * Guest pages with a shadow.  This does not conflict with
          * tlbflush_timestamp since page table pages are explicitly not
          * tracked for TLB-flush avoidance when a guest runs in shadow mode.
+         *
+         * pagetable_dying is used for HVM domains only. The layout here has
+         * to avoid re-use of the space used by linear_pt_count, which (only)
+         * PV guests use.
          */
-        u32 shadow_flags;
+        struct {
+            uint16_t shadow_flags;
+            bool_t pagetable_dying;
+        };
 
         /* When in use as a shadow, next shadow in this hash chain. */
         __pdx_t next_shadow;

["xsa280-4.9-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: move OOS flag bit positions

In preparation of reducing struct page_info's shadow_flags field to 16
bits, lower the bit positions used for SHF_out_of_sync and
SHF_oos_may_write.

Instead of also adjusting the open coded use in _get_page_type(),
introduce shadow_prepare_page_type_change() to contain knowledge of the
bit positions to shadow code.

This is part of XSA-280.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2799,15 +2799,8 @@ static int __get_page_type(struct page_i
         {
             struct domain *d = page_get_owner(page);
 
-            /* Normally we should never let a page go from type count 0
-             * to type count 1 when it is shadowed. One exception:
-             * out-of-sync shadowed pages are allowed to become
-             * writeable. */
-            if ( d && shadow_mode_enabled(d)
-                 && (page->count_info & PGC_page_table)
-                 && !((page->shadow_flags & (1u<<29))
-                      && type == PGT_writable_page) )
-               shadow_remove_all_shadows(d, _mfn(page_to_mfn(page)));
+            if ( d && shadow_mode_enabled(d) )
+               shadow_prepare_page_type_change(d, page, type);
 
             ASSERT(!(x & PGT_pae_xen_l2));
             if ( (x & PGT_type_mask) != type )
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -919,6 +919,9 @@ int sh_unsync(struct vcpu *v, mfn_t gmfn
          || !v->domain->arch.paging.shadow.oos_active )
         return 0;
 
+    BUILD_BUG_ON(!(typeof(pg->shadow_flags))SHF_out_of_sync);
+    BUILD_BUG_ON(!(typeof(pg->shadow_flags))SHF_oos_may_write);
+
     pg->shadow_flags |= SHF_out_of_sync|SHF_oos_may_write;
     oos_hash_add(v, gmfn);
     perfc_incr(shadow_unsync);
@@ -2810,6 +2813,26 @@ void sh_remove_shadows(struct domain *d,
     paging_unlock(d);
 }
 
+void shadow_prepare_page_type_change(struct domain *d, struct page_info *page,
+                                     unsigned long new_type)
+{
+    if ( !(page->count_info & PGC_page_table) )
+        return;
+
+#if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
+    /*
+     * Normally we should never let a page go from type count 0 to type
+     * count 1 when it is shadowed. One exception: out-of-sync shadowed
+     * pages are allowed to become writeable.
+     */
+    if ( (page->shadow_flags & SHF_oos_may_write) &&
+         new_type == PGT_writable_page )
+        return;
+#endif
+
+    shadow_remove_all_shadows(d, page_to_mfn(page));
+}
+
 static void
 sh_remove_all_shadows_and_parents(struct domain *d, mfn_t gmfn)
 /* Even harsher: this is a HVM page that we thing is no longer a pagetable.
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -287,8 +287,8 @@ static inline void sh_terminate_list(str
  * codepath is called during that time and is sensitive to oos issues, it may
  * need to use the second flag.
  */
-#define SHF_out_of_sync (1u<<30)
-#define SHF_oos_may_write (1u<<29)
+#define SHF_out_of_sync (1u << (SH_type_max_shadow + 1))
+#define SHF_oos_may_write (1u << (SH_type_max_shadow + 2))
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -81,6 +81,10 @@ void shadow_final_teardown(struct domain
 
 void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all);
 
+/* Adjust shadows ready for a guest page to change its type. */
+void shadow_prepare_page_type_change(struct domain *d, struct page_info *page,
+                                     unsigned long new_type);
+
 /* Discard _all_ mappings from the domain's shadows. */
 void shadow_blow_tables_per_domain(struct domain *d);
 
@@ -105,6 +109,10 @@ int shadow_set_allocation(struct domain
 static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
                                      bool_t fast, bool_t all) {}
 
+static inline void shadow_prepare_page_type_change(struct domain *d,
+                                                   struct page_info *page,
+                                                   unsigned long new_type) {}
+
 static inline void shadow_blow_tables_per_domain(struct domain *d) {}
 
 static inline int shadow_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,

["xsa280-4.10-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: shrink struct page_info's shadow_flags to 16 bits

This is to avoid it overlapping the linear_pt_count field needed for PV
domains. Introduce a separate, HVM-only pagetable_dying field to replace
the sole one left in the upper 16 bits.

Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
are not allowed on uint16_t fields and hence their use would have
required ugly casts. This is fine because all updates of the field ought
to occur with the paging lock held, and other updates of it use |= and
&= as well (i.e. using atomic operations here didn't really guard
against potentially racing updates elsewhere).

This is part of XSA-280.

Reported-by: Prgmr.com Security <security@prgmr.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -957,10 +957,14 @@ void shadow_promote(struct domain *d, mf
 
     /* Is the page already shadowed? */
     if ( !test_and_set_bit(_PGC_page_table, &page->count_info) )
+    {
         page->shadow_flags = 0;
+        if ( is_hvm_domain(d) )
+            page->pagetable_dying = false;
+    }
 
-    ASSERT(!test_bit(type, &page->shadow_flags));
-    set_bit(type, &page->shadow_flags);
+    ASSERT(!(page->shadow_flags & (1u << type)));
+    page->shadow_flags |= 1u << type;
     TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PROMOTE);
 }
 
@@ -969,9 +973,9 @@ void shadow_demote(struct domain *d, mfn
     struct page_info *page = mfn_to_page(gmfn);
 
     ASSERT(test_bit(_PGC_page_table, &page->count_info));
-    ASSERT(test_bit(type, &page->shadow_flags));
+    ASSERT(page->shadow_flags & (1u << type));
 
-    clear_bit(type, &page->shadow_flags);
+    page->shadow_flags &= ~(1u << type);
 
     if ( (page->shadow_flags & SHF_page_type_mask) == 0 )
     {
@@ -2801,7 +2805,7 @@ void sh_remove_shadows(struct domain *d,
     if ( !fast && all && (pg->count_info & PGC_page_table) )
     {
         SHADOW_ERROR("can't find all shadows of mfn %"PRI_mfn" "
-                     "(shadow_flags=%08x)\n",
+                     "(shadow_flags=%04x)\n",
                       mfn_x(gmfn), pg->shadow_flags);
         domain_crash(d);
     }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3328,8 +3328,8 @@ static int sh_page_fault(struct vcpu *v,
 
     /* Unshadow if we are writing to a toplevel pagetable that is
      * flagged as a dying process, and that is not currently used. */
-    if ( sh_mfn_is_a_page_table(gmfn)
-         && (mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying) )
+    if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&
+         mfn_to_page(gmfn)->pagetable_dying )
     {
         int used = 0;
         struct vcpu *tmp;
@@ -4301,9 +4301,9 @@ int sh_rm_write_access_from_sl1p(struct
     ASSERT(mfn_valid(smfn));
 
     /* Remember if we've been told that this process is being torn down */
-    if ( curr->domain == d )
+    if ( curr->domain == d && is_hvm_domain(d) )
         curr->arch.paging.shadow.pagetable_dying
-            = !!(mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying);
+            = mfn_to_page(gmfn)->pagetable_dying;
 
     sp = mfn_to_page(smfn);
 
@@ -4619,10 +4619,10 @@ static void sh_pagetable_dying(struct vc
                    : shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
         }
 
-        if ( mfn_valid(smfn) )
+        if ( mfn_valid(smfn) && is_hvm_domain(d) )
         {
             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
-            mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+            mfn_to_page(gmfn)->pagetable_dying = true;
             shadow_unhook_mappings(d, smfn, 1/* user pages only */);
             flush = 1;
         }
@@ -4659,9 +4659,9 @@ static void sh_pagetable_dying(struct vc
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
 #endif
 
-    if ( mfn_valid(smfn) )
+    if ( mfn_valid(smfn) && is_hvm_domain(d) )
     {
-        mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+        mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
         flush_tlb_mask(d->domain_dirty_cpumask);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -292,8 +292,6 @@ static inline void sh_terminate_list(str
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
-#define SHF_pagetable_dying (1u<<31)
-
 static inline int sh_page_has_multiple_shadows(struct page_info *pg)
 {
     u32 shadows;
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -188,8 +188,15 @@ struct page_info
          * Guest pages with a shadow.  This does not conflict with
          * tlbflush_timestamp since page table pages are explicitly not
          * tracked for TLB-flush avoidance when a guest runs in shadow mode.
+         *
+         * pagetable_dying is used for HVM domains only. The layout here has
+         * to avoid re-use of the space used by linear_pt_count, which (only)
+         * PV guests use.
          */
-        u32 shadow_flags;
+        struct {
+            uint16_t shadow_flags;
+            bool pagetable_dying;
+        };
 
         /* When in use as a shadow, next shadow in this hash chain. */
         __pdx_t next_shadow;

["xsa280-4.11-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: shrink struct page_info's shadow_flags to 16 bits

This is to avoid it overlapping the linear_pt_count field needed for PV
domains. Introduce a separate, HVM-only pagetable_dying field to replace
the sole one left in the upper 16 bits.

Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
are not allowed on uint16_t fields and hence their use would have
required ugly casts. This is fine because all updates of the field ought
to occur with the paging lock held, and other updates of it use |= and
&= as well (i.e. using atomic operations here didn't really guard
against potentially racing updates elsewhere).

This is part of XSA-280.

Reported-by: Prgmr.com Security <security@prgmr.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1028,10 +1028,14 @@ void shadow_promote(struct domain *d, mf
 
     /* Is the page already shadowed? */
     if ( !test_and_set_bit(_PGC_page_table, &page->count_info) )
+    {
         page->shadow_flags = 0;
+        if ( is_hvm_domain(d) )
+            page->pagetable_dying = false;
+    }
 
-    ASSERT(!test_bit(type, &page->shadow_flags));
-    set_bit(type, &page->shadow_flags);
+    ASSERT(!(page->shadow_flags & (1u << type)));
+    page->shadow_flags |= 1u << type;
     TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PROMOTE);
 }
 
@@ -1040,9 +1044,9 @@ void shadow_demote(struct domain *d, mfn
     struct page_info *page = mfn_to_page(gmfn);
 
     ASSERT(test_bit(_PGC_page_table, &page->count_info));
-    ASSERT(test_bit(type, &page->shadow_flags));
+    ASSERT(page->shadow_flags & (1u << type));
 
-    clear_bit(type, &page->shadow_flags);
+    page->shadow_flags &= ~(1u << type);
 
     if ( (page->shadow_flags & SHF_page_type_mask) == 0 )
     {
@@ -2921,7 +2925,7 @@ void sh_remove_shadows(struct domain *d,
     if ( !fast && all && (pg->count_info & PGC_page_table) )
     {
         SHADOW_ERROR("can't find all shadows of mfn %"PRI_mfn" "
-                     "(shadow_flags=%08x)\n",
+                     "(shadow_flags=%04x)\n",
                       mfn_x(gmfn), pg->shadow_flags);
         domain_crash(d);
     }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3299,8 +3299,8 @@ static int sh_page_fault(struct vcpu *v,
 
     /* Unshadow if we are writing to a toplevel pagetable that is
      * flagged as a dying process, and that is not currently used. */
-    if ( sh_mfn_is_a_page_table(gmfn)
-         && (mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying) )
+    if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&
+         mfn_to_page(gmfn)->pagetable_dying )
     {
         int used = 0;
         struct vcpu *tmp;
@@ -4254,9 +4254,9 @@ int sh_rm_write_access_from_sl1p(struct
     ASSERT(mfn_valid(smfn));
 
     /* Remember if we've been told that this process is being torn down */
-    if ( curr->domain == d )
+    if ( curr->domain == d && is_hvm_domain(d) )
         curr->arch.paging.shadow.pagetable_dying
-            = !!(mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying);
+            = mfn_to_page(gmfn)->pagetable_dying;
 
     sp = mfn_to_page(smfn);
 
@@ -4572,10 +4572,10 @@ static void sh_pagetable_dying(struct vc
                    : shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
         }
 
-        if ( mfn_valid(smfn) )
+        if ( mfn_valid(smfn) && is_hvm_domain(d) )
         {
             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
-            mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+            mfn_to_page(gmfn)->pagetable_dying = true;
             shadow_unhook_mappings(d, smfn, 1/* user pages only */);
             flush = 1;
         }
@@ -4612,9 +4612,9 @@ static void sh_pagetable_dying(struct vc
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
 #endif
 
-    if ( mfn_valid(smfn) )
+    if ( mfn_valid(smfn) && is_hvm_domain(d) )
     {
-        mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
+        mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
         flush_tlb_mask(d->dirty_cpumask);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -292,8 +292,6 @@ static inline void sh_terminate_list(str
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
-#define SHF_pagetable_dying (1u<<31)
-
 static inline int sh_page_has_multiple_shadows(struct page_info *pg)
 {
     u32 shadows;
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -259,8 +259,15 @@ struct page_info
          * Guest pages with a shadow.  This does not conflict with
          * tlbflush_timestamp since page table pages are explicitly not
          * tracked for TLB-flush avoidance when a guest runs in shadow mode.
+         *
+         * pagetable_dying is used for HVM domains only. The layout here has
+         * to avoid re-use of the space used by linear_pt_count, which (only)
+         * PV guests use.
          */
-        u32 shadow_flags;
+        struct {
+            uint16_t shadow_flags;
+            bool pagetable_dying;
+        };
 
         /* When in use as a shadow, next shadow in this hash chain. */
         __pdx_t next_shadow;


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic