'[oss-security] Xen Security Advisory 288 v3 (CVE-2019-17343) - x86: Inconsistent PV IOMMU discipline'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] Xen Security Advisory 288 v3 (CVE-2019-17343) - x86: Inconsistent PV IOMMU discipline
From:       Xen.org security team <security () xen ! org>
Date:       2019-10-25 11:11:15
Message-ID: E1iNxUx-0003zV-7X () xenbits ! xenproject ! org
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2019-17343 / XSA-288
                              version 3

                 x86: Inconsistent PV IOMMU discipline

UPDATES IN VERSION 3
====================

CVE assigned.

ISSUE DESCRIPTION
=================

In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU.  On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.

Until now, in order to accomplish this, the code has borrowed HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.

Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.

Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic.  HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.
Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies.

IMPACT
======

An untrusted PV domain with access to a physical device can DMA into
its own pagetables, leading to privilege escalation.

VULNERABLE SYSTEMS
==================

Only x86 systems are vulnerable.  ARM systems are not vulnerable.

Only systems where PV guests are given direct access to physical
devices (PCI pass-through) are vulnerable.  Systems with only HVM
guests, or systems which do not use PCI pass-through, are not
vulnerable.

MITIGATION
==========

Only assigning devices to HVM guests will avoid these vulnerabilities.

CREDITS
=======

This issue was discovered by Paul Durrant of Citrix.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

xsa288.patch           xen-unstable
xsa288-4.11.patch      Xen 4.11.x, Xen 4.10.x
xsa288-4.9.patch       Xen 4.9.x
xsa288-4.8.patch       Xen 4.8.x
xsa288-4.7.patch       Xen 4.7.x

$ sha256sum xsa288*
7254f0ce791b5543aec68643ec47e2bcf7823650949c7eb32db5122591f12e8c  xsa288.meta
e1159cb5c1c5a01b28753739b6a78b555ebe4b920cae766db47e0f2a1a21c188  xsa288.patch
e9986ceda84e7391c27d80fd541a0e5edf1eadef302a560b4e445ca9bad4c56e  xsa288-4.7.patch
14856543ccaa5b3db2a209d25637ed025f2eb940294d0cd07e03f56630a9e5af  xsa288-4.8.patch
df5e4a367f58491d54c778e2997142792c881d4f7b5a2a1d3339d2a3f1abafe5  xsa288-4.9.patch
58ba46b4814695dc34beaa5fb644931253bd0b0c6a8dc843c735beec152ae722  xsa288-4.11.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl2y19AMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZmCoH/3PTKQLGVnhe5iGtXVgfbb1h+vz/8t/BATDFgreL
LXSvxxK42FZ+inbr/qz/NUPS21yISOUu9agqWFzTq5qYpU1E4+FybwdjvIHBE6tG
16gFjHYfawvA3QAPndaZR8vdWVqOEu/YdhOSa7m9vRiUnxh2B44nX0oT/bXuGdKv
pyKrQk91hpeWPXxWzJ2k1hy1+/I+eEDxLvauvVaIulO/0bQyMTWcCDRCYdzShJEp
njdVj3+4ZvvNbtc4zrWmVtfyZfMLWFdYwCTcTQ7Gy0b9wVmGhD1UhZsgXd4i8H2Z
62HfUOesi7yO2OtI1T08GaRFoo9ArcUbyEKvxTGW5Iyh6NE=
=EvlR
-----END PGP SIGNATURE-----

["xsa288.meta" (application/octet-stream)]
["xsa288.patch" (application/octet-stream)]

From 3aa0f99fa27540ea5e3ce0ae354c7c890348c67a Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Wed, 23 Jan 2019 11:57:46 +0000
Subject: [PATCH] xen: Make coherent PV IOMMU discipline

In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU.  On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.

At the moment, in order to accomplish this, the code borrows HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.

Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.

Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic.  HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.

Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies:

* A race between a page being assigned and it being put into the
  physmap; for example:
  - P1: call populate_physmap() { A = allocate_domheap_pages() }
  - P2: Guess page A's mfn, and call decrease_reservation(A).  A is owned by the domain,
        and so Xen will clear the PGC_allocated bit and free the page
  - P1: finishes populate_physmap() { guest_physmap_add_entry() }

  Now the domain has a writable IOMMU mapping to a page it no longer owns.

* Pages start out as type PGT_none, but with a writable IOMMU mapping.
  If a guest uses a page as a page table without ever having created a
  writable mapping, the IOMMU mapping will not be removed; the guest
  will have a writable IOMMU mapping to a page it is currently using
  as a page table.

* A newly-allocated page can be DMA'd into with no special actions on
  the part of the guest; However, if a page is promoted to a
  non-writable type, the page must be mapped with a writable type before
  DMA'ing to it again, or the transaction will fail.

To fix this, do away with the "PV physmap" concept entirely, and
replace it with the following IOMMU discipline for PV guests:
 - (type == PGT_writable) <=> in iommu (even if type_count == 0)
 - Upon a final put_page(), check to see if type is PGT_writable; if so,
   iommu_unmap.

In order to achieve that:

- Remove PV IOMMU related code from guest_physmap_*

- Repurpose cleanup_page_cacheattr() into a general
  cleanup_page_mappings() function, which will both fix up Xen
  mappings for pages with special cache attributes, and also check for
  a PGT_writable type and remove pages if appropriate.

- For compatibility with current guests, grab-and-release a
  PGT_writable_page type for PV guests in guest_physmap_add_entry().
  This will cause most "normal" guest pages to start out life with
  PGT_writable_page type (and thus an IOMMU mapping), but no type
  count (so that they can be used as special cases at will).

Also, note that there is one exception to to the "PGT_writable => in
iommu" rule: xenheap pages shared with guests may be given a
PGT_writable type with one type reference.  This reference prevents
the type from changing, which in turn prevents page from gaining an
IOMMU mapping in get_page_type().  It's not clear whether this was
intentional or not, but it's not something to change in a security
update.

This is XSA-288.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c     | 95 +++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/mm/p2m.c | 35 +++++++++++++---
 2 files changed, 116 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4d3b17f3a8..71b3984dd1 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -81,6 +81,22 @@
  * OS's, which will generally use the WP bit to simplify copy-on-write
  * implementation (in that case, OS wants a fault when it writes to
  * an application-supplied buffer).
+ *
+ * PV domUs and IOMMUs:
+ * --------------------
+ * For a guest to be able to DMA into a page, that page must be in the
+ * domain's IOMMU.  However, we *must not* allow DMA into 'special'
+ * pages (such as page table pages, descriptor tables, &c); and we
+ * must also ensure that mappings are removed from the IOMMU when the
+ * page is freed.  Finally, it is inherently racy to make any changes
+ * based on a page with a non-zero type count.
+ *
+ * To that end, we put the page in the IOMMU only when a page gains
+ * the PGT_writeable type; and we remove the page when it loses the
+ * PGT_writeable type (not when the type count goes to zero).  This
+ * effectively protects the IOMMU status update with the type count we
+ * have just acquired.  We must also check for PGT_writable type when
+ * doing the final put_page(), and remove it from the iommu if so.
  */
 
 #include <xen/init.h>
@@ -2320,19 +2336,79 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
 }
 #endif /* CONFIG_PV */
 
-static int cleanup_page_cacheattr(struct page_info *page)
+/*
+ * In the course of a page's use, it may have caused other secondary
+ * mappings to have changed:
+ * - Xen's mappings may have been changed to accomodate the requested
+ *   cache attibutes
+ * - A page may have been put into the IOMMU of a PV guest when it
+ *   gained a writable mapping.
+ *
+ * Now that the page is being freed, clean up these mappings if
+ * appropriate.  NB that at this point the page is still "allocated",
+ * but not "live" (i.e., its refcount is 0), so it's safe to read the
+ * count_info, owner, and type_info without synchronization.
+ */
+static int cleanup_page_mappings(struct page_info *page)
 {
     unsigned int cacheattr =
         (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
+    int rc = 0;
+    unsigned long mfn = mfn_x(page_to_mfn(page));
 
-    if ( likely(cacheattr == 0) )
-        return 0;
+    /*
+     * If we've modified xen mappings as a result of guest cache
+     * attributes, restore them to the "normal" state.
+     */
+    if ( unlikely(cacheattr) )
+    {
+        page->count_info &= ~PGC_cacheattr_mask;
 
-    page->count_info &= ~PGC_cacheattr_mask;
+        BUG_ON(is_xen_heap_page(page));
 
-    BUG_ON(is_xen_heap_page(page));
+        rc = update_xen_mappings(mfn, 0);
+    }
 
-    return update_xen_mappings(mfn_x(page_to_mfn(page)), 0);
+    /*
+     * If this may be in a PV domain's IOMMU, remove it.
+     *
+     * NB that writable xenheap pages have their type set and cleared by
+     * implementation-specific code, rather than by get_page_type().  As such:
+     * - They aren't expected to have an IOMMU mapping, and
+     * - We don't necessarily expect the type count to be zero when the final
+     * put_page happens.
+     *
+     * Go ahead and attemp to call iommu_unmap() on xenheap pages anyway, just
+     * in case; but only ASSERT() that the type count is zero and remove the
+     * PGT_writable type for non-xenheap pages.
+     */
+    if ( (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
+    {
+        struct domain *d = page_get_owner(page);
+
+        if ( d && is_pv_domain(d) && unlikely(need_iommu_pt_sync(d)) )
+        {
+            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), PAGE_ORDER_4K);
+
+            if ( !rc )
+                rc = rc2;
+        }
+
+        if ( likely(!is_xen_heap_page(page)) )
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
+            /*
+             * Clear the type to record the fact that all writable mappings
+             * have been removed.  But if either operation failed, leave
+             * type_info alone.
+             */
+            if ( likely(!rc) )
+                page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
+        }
+    }
+
+    return rc;
 }
 
 void put_page(struct page_info *page)
@@ -2348,7 +2424,7 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !cleanup_page_mappings(page) )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
@@ -4043,9 +4119,10 @@ int steal_page(
      * NB this is safe even if the page ends up being given back to
      * the domain, because the count is zero: subsequent mappings will
      * cause the cache attributes to be re-instated inside
-     * get_page_from_l1e().
+     * get_page_from_l1e(), or the page to be added back to the IOMMU
+     * upon the type changing to PGT_writeable, as appropriate.
      */
-    if ( (rc = cleanup_page_cacheattr(page)) )
+    if ( (rc = cleanup_page_mappings(page)) )
     {
         /*
          * Couldn't fixup Xen's mappings; put things the way we found
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57dd5..e6eb842172 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -778,9 +778,9 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn_l, unsigned long mfn,
     p2m_type_t t;
     p2m_access_t a;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(p2m->domain) )
-        return need_iommu_pt_sync(p2m->domain) ?
-            iommu_legacy_unmap(p2m->domain, _dfn(mfn), page_order) : 0;
+        return 0;
 
     ASSERT(gfn_locked_by_me(p2m, gfn));
     P2M_DEBUG("removing gfn=%#lx mfn=%#lx\n", gfn_l, mfn);
@@ -825,10 +825,35 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
     int pod_count = 0;
     int rc = 0;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(d) )
-        return (need_iommu_pt_sync(d) && t == p2m_ram_rw) ?
-            iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn, page_order,
-                             IOMMUF_readable | IOMMUF_writable) : 0;
+    {
+        struct page_info *page = mfn_to_page(mfn);
+
+        /*
+         * Our interface for PV guests wrt IOMMU entries hasn't been very
+         * clear; but historically, pages have started out with IOMMU mappings,
+         * and only lose them when changed to a different page type.
+         *
+         * Retain this property by grabbing a writable type ref and then
+         * dropping it immediately.  The result will be pages that have a
+         * writable type (and an IOMMU entry), but a count of 0 (such that
+         * any guest-requested type changes succeed and remove the IOMMU
+         * entry).
+         */
+        if ( !need_iommu_pt_sync(d) || t != p2m_ram_rw )
+            return 0;
+
+        for ( i = 0; i < (1UL << page_order); ++i, ++page )
+        {
+            if ( get_page_and_type(page, d, PGT_writable_page) )
+                put_page_and_type(page);
+            else
+                return -EINVAL;
+        }
+
+        return 0;
+    }
 
     /* foreign pages are added thru p2m_add_foreign */
     if ( p2m_is_foreign(t) )
-- 
2.20.1


["xsa288-4.7.patch" (application/octet-stream)]

From dc93855a8b39fbc26f33bffa134fc94d36726a9f Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Wed, 23 Jan 2019 11:57:46 +0000
Subject: [PATCH] xen: Make coherent PV IOMMU discipline

In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU.  On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.

At the moment, in order to accomplish this, the code borrows HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.

Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.

Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic.  HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.

Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies:

* A race between a page being assigned and it being put into the
  physmap; for example:
  - P1: call populate_physmap() { A = allocate_domheap_pages() }
  - P2: Guess page A's mfn, and call decrease_reservation(A).  A is owned by the domain,
        and so Xen will clear the PGC_allocated bit and free the page
  - P1: finishes populate_physmap() { guest_physmap_add_entry() }

  Now the domain has a writable IOMMU mapping to a page it no longer owns.

* Pages start out as type PGT_none, but with a writable IOMMU mapping.
  If a guest uses a page as a page table without ever having created a
  writable mapping, the IOMMU mapping will not be removed; the guest
  will have a writable IOMMU mapping to a page it is currently using
  as a page table.

* A newly-allocated page can be DMA'd into with no special actions on
  the part of the guest; However, if a page is promoted to a
  non-writable type, the page must be mapped with a writable type before
  DMA'ing to it again, or the transaction will fail.

To fix this, do away with the "PV physmap" concept entirely, and
replace it with the following IOMMU discipline for PV guests:
 - (type == PGT_writable) <=> in iommu (even if type_count == 0)
 - Upon a final put_page(), check to see if type is PGT_writable; if so,
   iommu_unmap.

In order to achieve that:

- Remove PV IOMMU related code from guest_physmap_*

- Repurpose cleanup_page_cacheattr() into a general
  cleanup_page_mappings() function, which will both fix up Xen
  mappings for pages with special cache attributes, and also check for
  a PGT_writable type and remove pages if appropriate.

- For compatibility with current guests, grab-and-release a
  PGT_writable_page type for PV guests in guest_physmap_add_entry().
  This will cause most "normal" guest pages to start out life with
  PGT_writable_page type (and thus an IOMMU mapping), but no type
  count (so that they can be used as special cases at will).

Also, note that there is one exception to to the "PGT_writable => in
iommu" rule: xenheap pages shared with guests may be given a
PGT_writable type with one type reference.  This reference prevents
the type from changing, which in turn prevents page from gaining an
IOMMU mapping in get_page_type().  It's not clear whether this was
intentional or not, but it's not something to change in a security
update.

This is XSA-288.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c     | 95 +++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/mm/p2m.c | 41 +++++++++++--------
 2 files changed, 110 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 47f4aa4f6a..144479820b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -81,6 +81,22 @@
  * OS's, which will generally use the WP bit to simplify copy-on-write
  * implementation (in that case, OS wants a fault when it writes to
  * an application-supplied buffer).
+ *
+ * PV domUs and IOMMUs:
+ * --------------------
+ * For a guest to be able to DMA into a page, that page must be in the
+ * domain's IOMMU.  However, we *must not* allow DMA into 'special'
+ * pages (such as page table pages, descriptor tables, &c); and we
+ * must also ensure that mappings are removed from the IOMMU when the
+ * page is freed.  Finally, it is inherently racy to make any changes
+ * based on a page with a non-zero type count.
+ *
+ * To that end, we put the page in the IOMMU only when a page gains
+ * the PGT_writeable type; and we remove the page when it loses the
+ * PGT_writeable type (not when the type count goes to zero).  This
+ * effectively protects the IOMMU status update with the type count we
+ * have just acquired.  We must also check for PGT_writable type when
+ * doing the final put_page(), and remove it from the iommu if so.
  */
 
 #include <xen/kconfig.h>
@@ -2379,19 +2395,79 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
     return rc;
 }
 
-static int cleanup_page_cacheattr(struct page_info *page)
+/*
+ * In the course of a page's use, it may have caused other secondary
+ * mappings to have changed:
+ * - Xen's mappings may have been changed to accomodate the requested
+ *   cache attibutes
+ * - A page may have been put into the IOMMU of a PV guest when it
+ *   gained a writable mapping.
+ *
+ * Now that the page is being freed, clean up these mappings if
+ * appropriate.  NB that at this point the page is still "allocated",
+ * but not "live" (i.e., its refcount is 0), so it's safe to read the
+ * count_info, owner, and type_info without synchronization.
+ */
+static int cleanup_page_mappings(struct page_info *page)
 {
     unsigned int cacheattr =
         (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
+    int rc = 0;
+    unsigned long mfn = page_to_mfn(page);
 
-    if ( likely(cacheattr == 0) )
-        return 0;
+    /*
+     * If we've modified xen mappings as a result of guest cache
+     * attributes, restore them to the "normal" state.
+     */
+    if ( unlikely(cacheattr) )
+    {
+        page->count_info &= ~PGC_cacheattr_mask;
 
-    page->count_info &= ~PGC_cacheattr_mask;
+        BUG_ON(is_xen_heap_page(page));
 
-    BUG_ON(is_xen_heap_page(page));
+        rc = update_xen_mappings(mfn, 0);
+    }
 
-    return update_xen_mappings(page_to_mfn(page), 0);
+    /*
+     * If this may be in a PV domain's IOMMU, remove it.
+     *
+     * NB that writable xenheap pages have their type set and cleared by
+     * implementation-specific code, rather than by get_page_type().  As such:
+     * - They aren't expected to have an IOMMU mapping, and
+     * - We don't necessarily expect the type count to be zero when the final
+     * put_page happens.
+     *
+     * Go ahead and attemp to call iommu_unmap() on xenheap pages anyway, just
+     * in case; but only ASSERT() that the type count is zero and remove the
+     * PGT_writable type for non-xenheap pages.
+     */
+    if ( (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
+    {
+        struct domain *d = page_get_owner(page);
+
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
+        {
+            int rc2 = iommu_unmap_page(d, mfn);
+
+            if ( !rc )
+                rc = rc2;
+        }
+
+        if ( likely(!is_xen_heap_page(page)) )
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
+            /*
+             * Clear the type to record the fact that all writable mappings
+             * have been removed.  But if either operation failed, leave
+             * type_info alone.
+             */
+            if ( likely(!rc) )
+                page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
+        }
+    }
+
+    return rc;
 }
 
 void put_page(struct page_info *page)
@@ -2407,7 +2483,7 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !cleanup_page_mappings(page) )
             free_domheap_page(page);
         else
             MEM_LOG("Leaking pfn %lx", page_to_mfn(page));
@@ -4771,9 +4847,10 @@ int steal_page(
      * NB this is safe even if the page ends up being given back to
      * the domain, because the count is zero: subsequent mappings will
      * cause the cache attributes to be re-instated inside
-     * get_page_from_l1e().
+     * get_page_from_l1e(), or the page to be added back to the IOMMU
+     * upon the type changing to PGT_writeable, as appropriate.
      */
-    if ( (rc = cleanup_page_cacheattr(page)) )
+    if ( (rc = cleanup_page_mappings(page)) )
     {
         /*
          * Couldn't fixup Xen's mappings; put things the way we found
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 7bbb782bde..14f5bdb787 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -642,13 +642,9 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
     p2m_type_t t;
     p2m_access_t a;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(p2m->domain) )
-    {
-        if ( need_iommu(p2m->domain) )
-            for ( i = 0; i < (1 << page_order); i++ )
-                iommu_unmap_page(p2m->domain, mfn + i);
         return 0;
-    }
 
     ASSERT(gfn_locked_by_me(p2m, gfn));
     P2M_DEBUG("removing gfn=%#lx mfn=%#lx\n", gfn, mfn);
@@ -692,22 +688,33 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
     int pod_count = 0;
     int rc = 0;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(d) )
     {
-        if ( need_iommu(d) && t == p2m_ram_rw )
+        struct page_info *page = mfn_to_page(_mfn(mfn));
+
+        /*
+         * Our interface for PV guests wrt IOMMU entries hasn't been very
+         * clear; but historically, pages have started out with IOMMU mappings,
+         * and only lose them when changed to a different page type.
+         *
+         * Retain this property by grabbing a writable type ref and then
+         * dropping it immediately.  The result will be pages that have a
+         * writable type (and an IOMMU entry), but a count of 0 (such that
+         * any guest-requested type changes succeed and remove the IOMMU
+         * entry).
+         */
+        if ( !need_iommu(d) || t != p2m_ram_rw )
+            return 0;
+
+        for ( i = 0; i < (1UL << page_order); ++i, ++page )
         {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                rc = iommu_map_page(
-                    d, mfn + i, mfn + i, IOMMUF_readable|IOMMUF_writable);
-                if ( rc != 0 )
-                {
-                    while ( i-- > 0 )
-                        iommu_unmap_page(d, mfn + i);
-                    return rc;
-                }
-            }
+            if ( get_page_and_type(page, d, PGT_writable_page) )
+                put_page_and_type(page);
+            else
+                return -EINVAL;
         }
+
         return 0;
     }
 
-- 
2.20.1


["xsa288-4.8.patch" (application/octet-stream)]

From c4169a72eb391c6a381bc185efc8e83f27345801 Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Wed, 23 Jan 2019 11:57:46 +0000
Subject: [PATCH] xen: Make coherent PV IOMMU discipline

In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU.  On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.

At the moment, in order to accomplish this, the code borrows HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.

Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.

Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic.  HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.

Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies:

* A race between a page being assigned and it being put into the
  physmap; for example:
  - P1: call populate_physmap() { A = allocate_domheap_pages() }
  - P2: Guess page A's mfn, and call decrease_reservation(A).  A is owned by the domain,
        and so Xen will clear the PGC_allocated bit and free the page
  - P1: finishes populate_physmap() { guest_physmap_add_entry() }

  Now the domain has a writable IOMMU mapping to a page it no longer owns.

* Pages start out as type PGT_none, but with a writable IOMMU mapping.
  If a guest uses a page as a page table without ever having created a
  writable mapping, the IOMMU mapping will not be removed; the guest
  will have a writable IOMMU mapping to a page it is currently using
  as a page table.

* A newly-allocated page can be DMA'd into with no special actions on
  the part of the guest; However, if a page is promoted to a
  non-writable type, the page must be mapped with a writable type before
  DMA'ing to it again, or the transaction will fail.

To fix this, do away with the "PV physmap" concept entirely, and
replace it with the following IOMMU discipline for PV guests:
 - (type == PGT_writable) <=> in iommu (even if type_count == 0)
 - Upon a final put_page(), check to see if type is PGT_writable; if so,
   iommu_unmap.

In order to achieve that:

- Remove PV IOMMU related code from guest_physmap_*

- Repurpose cleanup_page_cacheattr() into a general
  cleanup_page_mappings() function, which will both fix up Xen
  mappings for pages with special cache attributes, and also check for
  a PGT_writable type and remove pages if appropriate.

- For compatibility with current guests, grab-and-release a
  PGT_writable_page type for PV guests in guest_physmap_add_entry().
  This will cause most "normal" guest pages to start out life with
  PGT_writable_page type (and thus an IOMMU mapping), but no type
  count (so that they can be used as special cases at will).

Also, note that there is one exception to to the "PGT_writable => in
iommu" rule: xenheap pages shared with guests may be given a
PGT_writable type with one type reference.  This reference prevents
the type from changing, which in turn prevents page from gaining an
IOMMU mapping in get_page_type().  It's not clear whether this was
intentional or not, but it's not something to change in a security
update.

This is XSA-288.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c     | 95 +++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/mm/p2m.c | 57 ++++++++++++--------------
 2 files changed, 111 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ef3b208fb0..9f375bc236 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -81,6 +81,22 @@
  * OS's, which will generally use the WP bit to simplify copy-on-write
  * implementation (in that case, OS wants a fault when it writes to
  * an application-supplied buffer).
+ *
+ * PV domUs and IOMMUs:
+ * --------------------
+ * For a guest to be able to DMA into a page, that page must be in the
+ * domain's IOMMU.  However, we *must not* allow DMA into 'special'
+ * pages (such as page table pages, descriptor tables, &c); and we
+ * must also ensure that mappings are removed from the IOMMU when the
+ * page is freed.  Finally, it is inherently racy to make any changes
+ * based on a page with a non-zero type count.
+ *
+ * To that end, we put the page in the IOMMU only when a page gains
+ * the PGT_writeable type; and we remove the page when it loses the
+ * PGT_writeable type (not when the type count goes to zero).  This
+ * effectively protects the IOMMU status update with the type count we
+ * have just acquired.  We must also check for PGT_writable type when
+ * doing the final put_page(), and remove it from the iommu if so.
  */
 
 #include <xen/kconfig.h>
@@ -2380,19 +2396,79 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
     return rc;
 }
 
-static int cleanup_page_cacheattr(struct page_info *page)
+/*
+ * In the course of a page's use, it may have caused other secondary
+ * mappings to have changed:
+ * - Xen's mappings may have been changed to accomodate the requested
+ *   cache attibutes
+ * - A page may have been put into the IOMMU of a PV guest when it
+ *   gained a writable mapping.
+ *
+ * Now that the page is being freed, clean up these mappings if
+ * appropriate.  NB that at this point the page is still "allocated",
+ * but not "live" (i.e., its refcount is 0), so it's safe to read the
+ * count_info, owner, and type_info without synchronization.
+ */
+static int cleanup_page_mappings(struct page_info *page)
 {
     unsigned int cacheattr =
         (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
+    int rc = 0;
+    unsigned long mfn = page_to_mfn(page);
 
-    if ( likely(cacheattr == 0) )
-        return 0;
+    /*
+     * If we've modified xen mappings as a result of guest cache
+     * attributes, restore them to the "normal" state.
+     */
+    if ( unlikely(cacheattr) )
+    {
+        page->count_info &= ~PGC_cacheattr_mask;
 
-    page->count_info &= ~PGC_cacheattr_mask;
+        BUG_ON(is_xen_heap_page(page));
 
-    BUG_ON(is_xen_heap_page(page));
+        rc = update_xen_mappings(mfn, 0);
+    }
 
-    return update_xen_mappings(page_to_mfn(page), 0);
+    /*
+     * If this may be in a PV domain's IOMMU, remove it.
+     *
+     * NB that writable xenheap pages have their type set and cleared by
+     * implementation-specific code, rather than by get_page_type().  As such:
+     * - They aren't expected to have an IOMMU mapping, and
+     * - We don't necessarily expect the type count to be zero when the final
+     * put_page happens.
+     *
+     * Go ahead and attemp to call iommu_unmap() on xenheap pages anyway, just
+     * in case; but only ASSERT() that the type count is zero and remove the
+     * PGT_writable type for non-xenheap pages.
+     */
+    if ( (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
+    {
+        struct domain *d = page_get_owner(page);
+
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
+        {
+            int rc2 = iommu_unmap_page(d, mfn);
+
+            if ( !rc )
+                rc = rc2;
+        }
+
+        if ( likely(!is_xen_heap_page(page)) )
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
+            /*
+             * Clear the type to record the fact that all writable mappings
+             * have been removed.  But if either operation failed, leave
+             * type_info alone.
+             */
+            if ( likely(!rc) )
+                page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
+        }
+    }
+
+    return rc;
 }
 
 void put_page(struct page_info *page)
@@ -2408,7 +2484,7 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !cleanup_page_mappings(page) )
             free_domheap_page(page);
         else
             MEM_LOG("Leaking pfn %lx", page_to_mfn(page));
@@ -4776,9 +4852,10 @@ int steal_page(
      * NB this is safe even if the page ends up being given back to
      * the domain, because the count is zero: subsequent mappings will
      * cause the cache attributes to be re-instated inside
-     * get_page_from_l1e().
+     * get_page_from_l1e(), or the page to be added back to the IOMMU
+     * upon the type changing to PGT_writeable, as appropriate.
      */
-    if ( (rc = cleanup_page_cacheattr(page)) )
+    if ( (rc = cleanup_page_mappings(page)) )
     {
         /*
          * Couldn't fixup Xen's mappings; put things the way we found
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 239f8e882b..9c1849bf52 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -644,23 +644,9 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
     p2m_type_t t;
     p2m_access_t a;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(p2m->domain) )
-    {
-        int rc = 0;
-
-        if ( need_iommu(p2m->domain) )
-        {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                int ret = iommu_unmap_page(p2m->domain, mfn + i);
-
-                if ( !rc )
-                    rc = ret;
-            }
-        }
-
-        return rc;
-    }
+        return 0;
 
     ASSERT(gfn_locked_by_me(p2m, gfn));
     P2M_DEBUG("removing gfn=%#lx mfn=%#lx\n", gfn, mfn);
@@ -703,26 +689,33 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
     int pod_count = 0;
     int rc = 0;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(d) )
     {
-        if ( need_iommu(d) && t == p2m_ram_rw )
-        {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                rc = iommu_map_page(d, mfn_x(mfn_add(mfn, i)),
-                                    mfn_x(mfn_add(mfn, i)),
-                                    IOMMUF_readable|IOMMUF_writable);
-                if ( rc != 0 )
-                {
-                    while ( i-- > 0 )
-                        /* If statement to satisfy __must_check. */
-                        if ( iommu_unmap_page(d, mfn_x(mfn_add(mfn, i))) )
-                            continue;
+        struct page_info *page = mfn_to_page(mfn);
 
-                    return rc;
-                }
-            }
+        /*
+         * Our interface for PV guests wrt IOMMU entries hasn't been very
+         * clear; but historically, pages have started out with IOMMU mappings,
+         * and only lose them when changed to a different page type.
+         *
+         * Retain this property by grabbing a writable type ref and then
+         * dropping it immediately.  The result will be pages that have a
+         * writable type (and an IOMMU entry), but a count of 0 (such that
+         * any guest-requested type changes succeed and remove the IOMMU
+         * entry).
+         */
+        if ( !need_iommu(d) || t != p2m_ram_rw )
+            return 0;
+
+        for ( i = 0; i < (1UL << page_order); ++i, ++page )
+        {
+            if ( get_page_and_type(page, d, PGT_writable_page) )
+                put_page_and_type(page);
+            else
+                return -EINVAL;
         }
+
         return 0;
     }
 
-- 
2.20.1


["xsa288-4.9.patch" (application/octet-stream)]

From bc4459e91233679cd0139989edf4656417e95b85 Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Wed, 23 Jan 2019 11:57:46 +0000
Subject: [PATCH] xen: Make coherent PV IOMMU discipline

In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU.  On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.

At the moment, in order to accomplish this, the code borrows HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.

Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.

Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic.  HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.

Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies:

* A race between a page being assigned and it being put into the
  physmap; for example:
  - P1: call populate_physmap() { A = allocate_domheap_pages() }
  - P2: Guess page A's mfn, and call decrease_reservation(A).  A is owned by the domain,
        and so Xen will clear the PGC_allocated bit and free the page
  - P1: finishes populate_physmap() { guest_physmap_add_entry() }

  Now the domain has a writable IOMMU mapping to a page it no longer owns.

* Pages start out as type PGT_none, but with a writable IOMMU mapping.
  If a guest uses a page as a page table without ever having created a
  writable mapping, the IOMMU mapping will not be removed; the guest
  will have a writable IOMMU mapping to a page it is currently using
  as a page table.

* A newly-allocated page can be DMA'd into with no special actions on
  the part of the guest; However, if a page is promoted to a
  non-writable type, the page must be mapped with a writable type before
  DMA'ing to it again, or the transaction will fail.

To fix this, do away with the "PV physmap" concept entirely, and
replace it with the following IOMMU discipline for PV guests:
 - (type == PGT_writable) <=> in iommu (even if type_count == 0)
 - Upon a final put_page(), check to see if type is PGT_writable; if so,
   iommu_unmap.

In order to achieve that:

- Remove PV IOMMU related code from guest_physmap_*

- Repurpose cleanup_page_cacheattr() into a general
  cleanup_page_mappings() function, which will both fix up Xen
  mappings for pages with special cache attributes, and also check for
  a PGT_writable type and remove pages if appropriate.

- For compatibility with current guests, grab-and-release a
  PGT_writable_page type for PV guests in guest_physmap_add_entry().
  This will cause most "normal" guest pages to start out life with
  PGT_writable_page type (and thus an IOMMU mapping), but no type
  count (so that they can be used as special cases at will).

Also, note that there is one exception to to the "PGT_writable => in
iommu" rule: xenheap pages shared with guests may be given a
PGT_writable type with one type reference.  This reference prevents
the type from changing, which in turn prevents page from gaining an
IOMMU mapping in get_page_type().  It's not clear whether this was
intentional or not, but it's not something to change in a security
update.

This is XSA-288.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c     | 95 +++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/mm/p2m.c | 57 ++++++++++++--------------
 2 files changed, 111 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 476ef8ee85..3a11c67478 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -81,6 +81,22 @@
  * OS's, which will generally use the WP bit to simplify copy-on-write
  * implementation (in that case, OS wants a fault when it writes to
  * an application-supplied buffer).
+ *
+ * PV domUs and IOMMUs:
+ * --------------------
+ * For a guest to be able to DMA into a page, that page must be in the
+ * domain's IOMMU.  However, we *must not* allow DMA into 'special'
+ * pages (such as page table pages, descriptor tables, &c); and we
+ * must also ensure that mappings are removed from the IOMMU when the
+ * page is freed.  Finally, it is inherently racy to make any changes
+ * based on a page with a non-zero type count.
+ *
+ * To that end, we put the page in the IOMMU only when a page gains
+ * the PGT_writeable type; and we remove the page when it loses the
+ * PGT_writeable type (not when the type count goes to zero).  This
+ * effectively protects the IOMMU status update with the type count we
+ * have just acquired.  We must also check for PGT_writable type when
+ * doing the final put_page(), and remove it from the iommu if so.
  */
 
 #include <xen/init.h>
@@ -2411,19 +2427,79 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
     return rc;
 }
 
-static int cleanup_page_cacheattr(struct page_info *page)
+/*
+ * In the course of a page's use, it may have caused other secondary
+ * mappings to have changed:
+ * - Xen's mappings may have been changed to accomodate the requested
+ *   cache attibutes
+ * - A page may have been put into the IOMMU of a PV guest when it
+ *   gained a writable mapping.
+ *
+ * Now that the page is being freed, clean up these mappings if
+ * appropriate.  NB that at this point the page is still "allocated",
+ * but not "live" (i.e., its refcount is 0), so it's safe to read the
+ * count_info, owner, and type_info without synchronization.
+ */
+static int cleanup_page_mappings(struct page_info *page)
 {
     unsigned int cacheattr =
         (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
+    int rc = 0;
+    unsigned long mfn = page_to_mfn(page);
 
-    if ( likely(cacheattr == 0) )
-        return 0;
+    /*
+     * If we've modified xen mappings as a result of guest cache
+     * attributes, restore them to the "normal" state.
+     */
+    if ( unlikely(cacheattr) )
+    {
+        page->count_info &= ~PGC_cacheattr_mask;
 
-    page->count_info &= ~PGC_cacheattr_mask;
+        BUG_ON(is_xen_heap_page(page));
 
-    BUG_ON(is_xen_heap_page(page));
+        rc = update_xen_mappings(mfn, 0);
+    }
 
-    return update_xen_mappings(page_to_mfn(page), 0);
+    /*
+     * If this may be in a PV domain's IOMMU, remove it.
+     *
+     * NB that writable xenheap pages have their type set and cleared by
+     * implementation-specific code, rather than by get_page_type().  As such:
+     * - They aren't expected to have an IOMMU mapping, and
+     * - We don't necessarily expect the type count to be zero when the final
+     * put_page happens.
+     *
+     * Go ahead and attemp to call iommu_unmap() on xenheap pages anyway, just
+     * in case; but only ASSERT() that the type count is zero and remove the
+     * PGT_writable type for non-xenheap pages.
+     */
+    if ( (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
+    {
+        struct domain *d = page_get_owner(page);
+
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
+        {
+            int rc2 = iommu_unmap_page(d, mfn);
+
+            if ( !rc )
+                rc = rc2;
+        }
+
+        if ( likely(!is_xen_heap_page(page)) )
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
+            /*
+             * Clear the type to record the fact that all writable mappings
+             * have been removed.  But if either operation failed, leave
+             * type_info alone.
+             */
+            if ( likely(!rc) )
+                page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
+        }
+    }
+
+    return rc;
 }
 
 void put_page(struct page_info *page)
@@ -2439,7 +2515,7 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !cleanup_page_mappings(page) )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
@@ -4829,9 +4905,10 @@ int steal_page(
      * NB this is safe even if the page ends up being given back to
      * the domain, because the count is zero: subsequent mappings will
      * cause the cache attributes to be re-instated inside
-     * get_page_from_l1e().
+     * get_page_from_l1e(), or the page to be added back to the IOMMU
+     * upon the type changing to PGT_writeable, as appropriate.
      */
-    if ( (rc = cleanup_page_cacheattr(page)) )
+    if ( (rc = cleanup_page_mappings(page)) )
     {
         /*
          * Couldn't fixup Xen's mappings; put things the way we found
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ece32ffb8f..25fed08efb 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -706,23 +706,9 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
     p2m_type_t t;
     p2m_access_t a;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(p2m->domain) )
-    {
-        int rc = 0;
-
-        if ( need_iommu(p2m->domain) )
-        {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                int ret = iommu_unmap_page(p2m->domain, mfn + i);
-
-                if ( !rc )
-                    rc = ret;
-            }
-        }
-
-        return rc;
-    }
+        return 0;
 
     ASSERT(gfn_locked_by_me(p2m, gfn));
     P2M_DEBUG("removing gfn=%#lx mfn=%#lx\n", gfn, mfn);
@@ -765,26 +751,33 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
     int pod_count = 0;
     int rc = 0;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(d) )
     {
-        if ( need_iommu(d) && t == p2m_ram_rw )
-        {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                rc = iommu_map_page(d, mfn_x(mfn_add(mfn, i)),
-                                    mfn_x(mfn_add(mfn, i)),
-                                    IOMMUF_readable|IOMMUF_writable);
-                if ( rc != 0 )
-                {
-                    while ( i-- > 0 )
-                        /* If statement to satisfy __must_check. */
-                        if ( iommu_unmap_page(d, mfn_x(mfn_add(mfn, i))) )
-                            continue;
+        struct page_info *page = mfn_to_page(mfn);
 
-                    return rc;
-                }
-            }
+        /*
+         * Our interface for PV guests wrt IOMMU entries hasn't been very
+         * clear; but historically, pages have started out with IOMMU mappings,
+         * and only lose them when changed to a different page type.
+         *
+         * Retain this property by grabbing a writable type ref and then
+         * dropping it immediately.  The result will be pages that have a
+         * writable type (and an IOMMU entry), but a count of 0 (such that
+         * any guest-requested type changes succeed and remove the IOMMU
+         * entry).
+         */
+        if ( !need_iommu(d) || t != p2m_ram_rw )
+            return 0;
+
+        for ( i = 0; i < (1UL << page_order); ++i, ++page )
+        {
+            if ( get_page_and_type(page, d, PGT_writable_page) )
+                put_page_and_type(page);
+            else
+                return -EINVAL;
         }
+
         return 0;
     }
 
-- 
2.20.1


["xsa288-4.11.patch" (application/octet-stream)]

From 5d3a02e320f88747b75e3794c2e694284ae64c3e Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Wed, 23 Jan 2019 11:57:46 +0000
Subject: [PATCH] xen: Make coherent PV IOMMU discipline

In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU.  On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.

At the moment, in order to accomplish this, the code borrows HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.

Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.

Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic.  HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.

Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies:

* A race between a page being assigned and it being put into the
  physmap; for example:
  - P1: call populate_physmap() { A = allocate_domheap_pages() }
  - P2: Guess page A's mfn, and call decrease_reservation(A).  A is owned by the domain,
        and so Xen will clear the PGC_allocated bit and free the page
  - P1: finishes populate_physmap() { guest_physmap_add_entry() }

  Now the domain has a writable IOMMU mapping to a page it no longer owns.

* Pages start out as type PGT_none, but with a writable IOMMU mapping.
  If a guest uses a page as a page table without ever having created a
  writable mapping, the IOMMU mapping will not be removed; the guest
  will have a writable IOMMU mapping to a page it is currently using
  as a page table.

* A newly-allocated page can be DMA'd into with no special actions on
  the part of the guest; However, if a page is promoted to a
  non-writable type, the page must be mapped with a writable type before
  DMA'ing to it again, or the transaction will fail.

To fix this, do away with the "PV physmap" concept entirely, and
replace it with the following IOMMU discipline for PV guests:
 - (type == PGT_writable) <=> in iommu (even if type_count == 0)
 - Upon a final put_page(), check to see if type is PGT_writable; if so,
   iommu_unmap.

In order to achieve that:

- Remove PV IOMMU related code from guest_physmap_*

- Repurpose cleanup_page_cacheattr() into a general
  cleanup_page_mappings() function, which will both fix up Xen
  mappings for pages with special cache attributes, and also check for
  a PGT_writable type and remove pages if appropriate.

- For compatibility with current guests, grab-and-release a
  PGT_writable_page type for PV guests in guest_physmap_add_entry().
  This will cause most "normal" guest pages to start out life with
  PGT_writable_page type (and thus an IOMMU mapping), but no type
  count (so that they can be used as special cases at will).

Also, note that there is one exception to to the "PGT_writable => in
iommu" rule: xenheap pages shared with guests may be given a
PGT_writable type with one type reference.  This reference prevents
the type from changing, which in turn prevents page from gaining an
IOMMU mapping in get_page_type().  It's not clear whether this was
intentional or not, but it's not something to change in a security
update.

This is XSA-288.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c     | 95 +++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/mm/p2m.c | 57 ++++++++++++--------------
 2 files changed, 111 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d8ff58c901..ad8aacad68 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -81,6 +81,22 @@
  * OS's, which will generally use the WP bit to simplify copy-on-write
  * implementation (in that case, OS wants a fault when it writes to
  * an application-supplied buffer).
+ *
+ * PV domUs and IOMMUs:
+ * --------------------
+ * For a guest to be able to DMA into a page, that page must be in the
+ * domain's IOMMU.  However, we *must not* allow DMA into 'special'
+ * pages (such as page table pages, descriptor tables, &c); and we
+ * must also ensure that mappings are removed from the IOMMU when the
+ * page is freed.  Finally, it is inherently racy to make any changes
+ * based on a page with a non-zero type count.
+ *
+ * To that end, we put the page in the IOMMU only when a page gains
+ * the PGT_writeable type; and we remove the page when it loses the
+ * PGT_writeable type (not when the type count goes to zero).  This
+ * effectively protects the IOMMU status update with the type count we
+ * have just acquired.  We must also check for PGT_writable type when
+ * doing the final put_page(), and remove it from the iommu if so.
  */
 
 #include <xen/init.h>
@@ -2275,19 +2291,79 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
     return rc;
 }
 
-static int cleanup_page_cacheattr(struct page_info *page)
+/*
+ * In the course of a page's use, it may have caused other secondary
+ * mappings to have changed:
+ * - Xen's mappings may have been changed to accomodate the requested
+ *   cache attibutes
+ * - A page may have been put into the IOMMU of a PV guest when it
+ *   gained a writable mapping.
+ *
+ * Now that the page is being freed, clean up these mappings if
+ * appropriate.  NB that at this point the page is still "allocated",
+ * but not "live" (i.e., its refcount is 0), so it's safe to read the
+ * count_info, owner, and type_info without synchronization.
+ */
+static int cleanup_page_mappings(struct page_info *page)
 {
     unsigned int cacheattr =
         (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
+    int rc = 0;
+    unsigned long mfn = mfn_x(page_to_mfn(page));
 
-    if ( likely(cacheattr == 0) )
-        return 0;
+    /*
+     * If we've modified xen mappings as a result of guest cache
+     * attributes, restore them to the "normal" state.
+     */
+    if ( unlikely(cacheattr) )
+    {
+        page->count_info &= ~PGC_cacheattr_mask;
 
-    page->count_info &= ~PGC_cacheattr_mask;
+        BUG_ON(is_xen_heap_page(page));
 
-    BUG_ON(is_xen_heap_page(page));
+        rc = update_xen_mappings(mfn, 0);
+    }
 
-    return update_xen_mappings(mfn_x(page_to_mfn(page)), 0);
+    /*
+     * If this may be in a PV domain's IOMMU, remove it.
+     *
+     * NB that writable xenheap pages have their type set and cleared by
+     * implementation-specific code, rather than by get_page_type().  As such:
+     * - They aren't expected to have an IOMMU mapping, and
+     * - We don't necessarily expect the type count to be zero when the final
+     * put_page happens.
+     *
+     * Go ahead and attemp to call iommu_unmap() on xenheap pages anyway, just
+     * in case; but only ASSERT() that the type count is zero and remove the
+     * PGT_writable type for non-xenheap pages.
+     */
+    if ( (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
+    {
+        struct domain *d = page_get_owner(page);
+
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
+        {
+            int rc2 = iommu_unmap_page(d, mfn);
+
+            if ( !rc )
+                rc = rc2;
+        }
+
+        if ( likely(!is_xen_heap_page(page)) )
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
+            /*
+             * Clear the type to record the fact that all writable mappings
+             * have been removed.  But if either operation failed, leave
+             * type_info alone.
+             */
+            if ( likely(!rc) )
+                page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
+        }
+    }
+
+    return rc;
 }
 
 void put_page(struct page_info *page)
@@ -2303,7 +2379,7 @@ void put_page(struct page_info *page)
 
     if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        if ( cleanup_page_cacheattr(page) == 0 )
+        if ( !cleanup_page_mappings(page) )
             free_domheap_page(page);
         else
             gdprintk(XENLOG_WARNING,
@@ -4020,9 +4096,10 @@ int steal_page(
      * NB this is safe even if the page ends up being given back to
      * the domain, because the count is zero: subsequent mappings will
      * cause the cache attributes to be re-instated inside
-     * get_page_from_l1e().
+     * get_page_from_l1e(), or the page to be added back to the IOMMU
+     * upon the type changing to PGT_writeable, as appropriate.
      */
-    if ( (rc = cleanup_page_cacheattr(page)) )
+    if ( (rc = cleanup_page_mappings(page)) )
     {
         /*
          * Couldn't fixup Xen's mappings; put things the way we found
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c53cab44d9..2b62bc61dd 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -708,23 +708,9 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn_l, unsigned long mfn,
     p2m_type_t t;
     p2m_access_t a;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(p2m->domain) )
-    {
-        int rc = 0;
-
-        if ( need_iommu(p2m->domain) )
-        {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                int ret = iommu_unmap_page(p2m->domain, mfn + i);
-
-                if ( !rc )
-                    rc = ret;
-            }
-        }
-
-        return rc;
-    }
+        return 0;
 
     ASSERT(gfn_locked_by_me(p2m, gfn));
     P2M_DEBUG("removing gfn=%#lx mfn=%#lx\n", gfn_l, mfn);
@@ -769,26 +755,33 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
     int pod_count = 0;
     int rc = 0;
 
+    /* IOMMU for PV guests is handled in get_page_type() and put_page(). */
     if ( !paging_mode_translate(d) )
     {
-        if ( need_iommu(d) && t == p2m_ram_rw )
-        {
-            for ( i = 0; i < (1 << page_order); i++ )
-            {
-                rc = iommu_map_page(d, mfn_x(mfn_add(mfn, i)),
-                                    mfn_x(mfn_add(mfn, i)),
-                                    IOMMUF_readable|IOMMUF_writable);
-                if ( rc != 0 )
-                {
-                    while ( i-- > 0 )
-                        /* If statement to satisfy __must_check. */
-                        if ( iommu_unmap_page(d, mfn_x(mfn_add(mfn, i))) )
-                            continue;
+        struct page_info *page = mfn_to_page(mfn);
 
-                    return rc;
-                }
-            }
+        /*
+         * Our interface for PV guests wrt IOMMU entries hasn't been very
+         * clear; but historically, pages have started out with IOMMU mappings,
+         * and only lose them when changed to a different page type.
+         *
+         * Retain this property by grabbing a writable type ref and then
+         * dropping it immediately.  The result will be pages that have a
+         * writable type (and an IOMMU entry), but a count of 0 (such that
+         * any guest-requested type changes succeed and remove the IOMMU
+         * entry).
+         */
+        if ( !need_iommu(d) || t != p2m_ram_rw )
+            return 0;
+
+        for ( i = 0; i < (1UL << page_order); ++i, ++page )
+        {
+            if ( get_page_and_type(page, d, PGT_writable_page) )
+                put_page_and_type(page);
+            else
+                return -EINVAL;
         }
+
         return 0;
     }
 
-- 
2.20.1



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic