[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] Xen Security Advisory 275 v2 - insufficient TLB flushing / improper large page mappin
From:       Xen.org security team <security () xen ! org>
Date:       2018-11-20 13:26:23
Message-ID: E1gP62p-0000ba-Bx () xenbits ! xenproject ! org
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

                    Xen Security Advisory XSA-275
                              version 2

  insufficient TLB flushing / improper large page mappings with AMD IOMMUs

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

In order to be certain that no undue access to memory is possible
anymore after IOMMU mappings of this memory have been removed,
Translation Lookaside Buffers (TLBs) need to be flushed after most
changes to such mappings.  Xen bypassed certain IOMMU flushes on AMD
x86 hardware.

Furthermore logic exists Xen to re-combine small page mappings
into larger ones.  Such re-combination could have occured in cases
when it was not really safe/correct to do so.

IMPACT
======

A malicious or buggy guest may be able to escalate its privileges, may
cause a Denial of Service (DoS) affecting the entire host, or may be
able to access data it is not supposed to access (information leak).

VULNERABLE SYSTEMS
==================

Xen versions from at least 3.2 onwards are affected.  Note that the
situation is worse in 4.1 and earlier, in that there's no flushing of
the TLB at all.

Only systems with AMD x86 hardware with enabled IOMMU are affected.

ARM and Intel x86 systems, and AMD x86 systems without enabled IOMMU,
are not affected.

Only systems where physical PCI devices are assigned to untrusted guests
are vulnerable.

MITIGATION
==========

There is no known mitigation for affected system/guest combinations.

CREDITS
=======

This issue was discovered by Paul Durrant of Citrix.

RESOLUTION
==========

Applying the appropriate set of attached patches resolves this issue.

xsa275-?.patch           xen-unstable
xsa275-4.11-?.patch      Xen 4.11.x ... Xen 4.8.x
xsa275-4.7-?.patch       Xen 4.7.x

$ sha256sum xsa275*
b5a02598cd2cffcc2cb59c724eeabb50220fa55f2cbe571726a5228909bf7bfe  xsa275.meta
7a3360e61fbb088f7d9f2b92921c9dceb08a1e01563c42ba4cf4a9999fe42fc4  xsa275-1.patch
4783a3abd2d87386ce9a7b790666ad398c5e027a6a146fce6424f0bcbfd8a7c6  xsa275-2.patch
49844d06f24ea129f1a501b4b0d5cb6ec3b288f3a2b41377ce793cc6fc81a788  xsa275-4.7-1.patch
7ea8bf2ff2c8c92cb064a70959a1148229c4577109015bd5aab72603ccb8f7e3  xsa275-4.7-2.patch
15d1aa7528368ed92caf8ea9baf77a406e1de26d0697dafd8a85da0d66eb95dc  xsa275-4.11-1.patch
0806e8c904ac9e8eb89404dffd227fcd56da84b7eb0150ee1e9b4bee54a05b4e  xsa275-4.11-2.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAlv0C2kMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZEmUIAJh8KKnerBI188shqJlCI2yr3qXG75xsnwQSR4Xd
5lIRLQepG92cPkJa6RPWelJY0rHmPTlFj+apO7k4ZOG4WsZkp8vK16pkOiCGP8wI
J7UXfdxj9twOEbvLUE+Xe4bJI7/GQ9UbHefZ5LMdive6jYkq20ZUD7nZOBsXDX7r
znb6plF62VzhoGvvL2yLyZRnRJfs91bNfnqPZG54tHDPXFTntVZghrIYKW8kboNF
LZNi8fMrk0URy6uUkF2YpzLZ+JoMlPMVPEX3c+bx5xFm7xZc37rGmbHaj+L/5ViY
8e+2EEhzIGYI7liTSgKOzlkxolJ08bd/xolVAAo8vNeHjHo=
=noV1
-----END PGP SIGNATURE-----

["xsa275.meta" (application/octet-stream)]
["xsa275-1.patch" (application/octet-stream)]

From: Roger Pau Monné <roger.pau@citrix.com>
Subject: amd/iommu: fix flush checks

Flush checking for AMD IOMMU didn't check whether the previous entry
was present, or whether the flags (writable/readable) changed in order
to decide whether a flush should be executed.

Fix this by taking the writable/readable/next-level fields into account,
together with the present bit.

Along these lines the flushing in amd_iommu_map_page() must not be
omitted for PV domains. The comment there was simply wrong: Mappings may
very well change, both their addresses and their permissions. Ultimately
this should honor iommu_dont_flush_iotlb, but to achieve this
amd_iommu_ops first needs to gain an .iotlb_flush hook.

Also make clear_iommu_pte_present() static, to demonstrate there's no
caller omitting the (subsequent) flush.

This is part of XSA-275.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: Re-base.
v3: Drop bogus 2nd is_hvm_domain() in amd_iommu_map_page().
v2: Get old R/W bits from the correct half. Also check change of next-
    level field, perhaps just to be on the safe side. Make
    clear_iommu_pte_present() static. Cosmetics.

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -35,7 +35,7 @@ static unsigned int pfn_to_pde_idx(unsig
     return idx;
 }
 
-void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long dfn)
+static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long dfn)
 {
     u64 *table, *pte;
 
@@ -49,23 +49,42 @@ static bool_t set_iommu_pde_present(u32
                                     unsigned int next_level,
                                     bool_t iw, bool_t ir)
 {
-    u64 addr_lo, addr_hi, maddr_old, maddr_next;
+    uint64_t addr_lo, addr_hi, maddr_next;
     u32 entry;
-    bool_t need_flush = 0;
+    bool need_flush = false, old_present;
 
     maddr_next = (u64)next_mfn << PAGE_SHIFT;
 
-    addr_hi = get_field_from_reg_u32(pde[1],
-                                     IOMMU_PTE_ADDR_HIGH_MASK,
-                                     IOMMU_PTE_ADDR_HIGH_SHIFT);
-    addr_lo = get_field_from_reg_u32(pde[0],
-                                     IOMMU_PTE_ADDR_LOW_MASK,
-                                     IOMMU_PTE_ADDR_LOW_SHIFT);
-
-    maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
-
-    if ( maddr_old != maddr_next )
-        need_flush = 1;
+    old_present = get_field_from_reg_u32(pde[0], IOMMU_PTE_PRESENT_MASK,
+                                         IOMMU_PTE_PRESENT_SHIFT);
+    if ( old_present )
+    {
+        bool old_r, old_w;
+        unsigned int old_level;
+        uint64_t maddr_old;
+
+        addr_hi = get_field_from_reg_u32(pde[1],
+                                         IOMMU_PTE_ADDR_HIGH_MASK,
+                                         IOMMU_PTE_ADDR_HIGH_SHIFT);
+        addr_lo = get_field_from_reg_u32(pde[0],
+                                         IOMMU_PTE_ADDR_LOW_MASK,
+                                         IOMMU_PTE_ADDR_LOW_SHIFT);
+        old_level = get_field_from_reg_u32(pde[0],
+                                           IOMMU_PDE_NEXT_LEVEL_MASK,
+                                           IOMMU_PDE_NEXT_LEVEL_SHIFT);
+        old_w = get_field_from_reg_u32(pde[1],
+                                       IOMMU_PTE_IO_WRITE_PERMISSION_MASK,
+                                       IOMMU_PTE_IO_WRITE_PERMISSION_SHIFT);
+        old_r = get_field_from_reg_u32(pde[1],
+                                       IOMMU_PTE_IO_READ_PERMISSION_MASK,
+                                       IOMMU_PTE_IO_READ_PERMISSION_SHIFT);
+
+        maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
+
+        if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
+             old_level != next_level )
+            need_flush = true;
+    }
 
     addr_lo = maddr_next & DMA_32BIT_MASK;
     addr_hi = maddr_next >> 32;
@@ -684,10 +703,7 @@ int amd_iommu_map_page(struct domain *d,
     if ( !need_flush )
         goto out;
 
-    /* 4K mapping for PV guests never changes, 
-     * no need to flush if we trust non-present bits */
-    if ( is_hvm_domain(d) )
-        amd_iommu_flush_pages(d, dfn_x(dfn), 0);
+    amd_iommu_flush_pages(d, dfn_x(dfn), 0);
 
     for ( merge_level = 2; merge_level <= hd->arch.paging_mode;
           merge_level++ )

["xsa275-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: suppress PTE merging after initial table creation

The logic is not fit for this purpose, so simply disable its use until
it can be fixed / replaced. Note that this re-enables merging for the
table creation case, which was disabled as a (perhaps unintended) side
effect of the earlier "amd/iommu: fix flush checks". It relies on no
page getting mapped more than once (with different properties) in this
process, as that would still be beyond what the merging logic can cope
with. But arch_iommu_populate_page_table() guarantees this afaict.

This is part of XSA-275.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: Re-base.

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -699,11 +699,24 @@ int amd_iommu_map_page(struct domain *d,
                                        !!(flags & IOMMUF_writable),
                                        !!(flags & IOMMUF_readable));
 
-    /* Do not increase pde count if io mapping has not been changed */
-    if ( !need_flush )
-        goto out;
+    if ( need_flush )
+    {
+        amd_iommu_flush_pages(d, dfn_x(dfn), 0);
+        /* No further merging, as the logic doesn't cope. */
+        hd->arch.no_merge = true;
+    }
 
-    amd_iommu_flush_pages(d, dfn_x(dfn), 0);
+    /*
+     * Suppress merging of non-R/W mappings or after initial table creation,
+     * as the merge logic does not cope with this.
+     */
+    if ( hd->arch.no_merge || flags != (IOMMUF_writable | IOMMUF_readable) )
+        goto out;
+    if ( d->creation_finished )
+    {
+        hd->arch.no_merge = true;
+        goto out;
+    }
 
     for ( merge_level = 2; merge_level <= hd->arch.paging_mode;
           merge_level++ )
@@ -780,6 +793,10 @@ int amd_iommu_unmap_page(struct domain *
 
     /* mark PTE as 'page not present' */
     clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
+
+    /* No further merging in amd_iommu_map_page(), as the logic doesn't cope. */
+    hd->arch.no_merge = true;
+
     spin_unlock(&hd->arch.mapping_lock);
 
     amd_iommu_flush_pages(d, dfn_x(dfn), 0);
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -52,6 +52,7 @@ struct arch_iommu
 
     /* amd iommu support */
     int paging_mode;
+    bool no_merge;
     struct page_info *root_table;
     struct guest_iommu *g_iommu;
 };

["xsa275-4.7-1.patch" (application/octet-stream)]

From: Roger Pau Monné <roger.pau@citrix.com>
Subject: amd/iommu: fix flush checks

Flush checking for AMD IOMMU didn't check whether the previous entry
was present, or whether the flags (writable/readable) changed in order
to decide whether a flush should be executed.

Fix this by taking the writable/readable/next-level fields into account,
together with the present bit.

Along these lines the flushing in amd_iommu_map_page() must not be
omitted for PV domains. The comment there was simply wrong: Mappings may
very well change, both their addresses and their permissions. Ultimately
this should honor iommu_dont_flush_iotlb, but to achieve this
amd_iommu_ops first needs to gain an .iotlb_flush hook.

Also make clear_iommu_pte_present() static, to demonstrate there's no
caller omitting the (subsequent) flush.

This is part of XSA-275.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -36,7 +36,7 @@ static unsigned int pfn_to_pde_idx(unsig
     return idx;
 }
 
-void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
+static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
 {
     u64 *table, *pte;
 
@@ -50,23 +50,42 @@ static bool_t set_iommu_pde_present(u32
                                     unsigned int next_level,
                                     bool_t iw, bool_t ir)
 {
-    u64 addr_lo, addr_hi, maddr_old, maddr_next;
+    uint64_t addr_lo, addr_hi, maddr_next;
     u32 entry;
-    bool_t need_flush = 0;
+    bool_t need_flush = 0, old_present;
 
     maddr_next = (u64)next_mfn << PAGE_SHIFT;
 
-    addr_hi = get_field_from_reg_u32(pde[1],
-                                     IOMMU_PTE_ADDR_HIGH_MASK,
-                                     IOMMU_PTE_ADDR_HIGH_SHIFT);
-    addr_lo = get_field_from_reg_u32(pde[0],
-                                     IOMMU_PTE_ADDR_LOW_MASK,
-                                     IOMMU_PTE_ADDR_LOW_SHIFT);
-
-    maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
-
-    if ( maddr_old != maddr_next )
-        need_flush = 1;
+    old_present = get_field_from_reg_u32(pde[0], IOMMU_PTE_PRESENT_MASK,
+                                         IOMMU_PTE_PRESENT_SHIFT);
+    if ( old_present )
+    {
+        bool_t old_r, old_w;
+        unsigned int old_level;
+        uint64_t maddr_old;
+
+        addr_hi = get_field_from_reg_u32(pde[1],
+                                         IOMMU_PTE_ADDR_HIGH_MASK,
+                                         IOMMU_PTE_ADDR_HIGH_SHIFT);
+        addr_lo = get_field_from_reg_u32(pde[0],
+                                         IOMMU_PTE_ADDR_LOW_MASK,
+                                         IOMMU_PTE_ADDR_LOW_SHIFT);
+        old_level = get_field_from_reg_u32(pde[0],
+                                           IOMMU_PDE_NEXT_LEVEL_MASK,
+                                           IOMMU_PDE_NEXT_LEVEL_SHIFT);
+        old_w = get_field_from_reg_u32(pde[1],
+                                       IOMMU_PTE_IO_WRITE_PERMISSION_MASK,
+                                       IOMMU_PTE_IO_WRITE_PERMISSION_SHIFT);
+        old_r = get_field_from_reg_u32(pde[1],
+                                       IOMMU_PTE_IO_READ_PERMISSION_MASK,
+                                       IOMMU_PTE_IO_READ_PERMISSION_SHIFT);
+
+        maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
+
+        if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
+             old_level != next_level )
+            need_flush = 1;
+    }
 
     addr_lo = maddr_next & DMA_32BIT_MASK;
     addr_hi = maddr_next >> 32;
@@ -680,10 +699,7 @@ int amd_iommu_map_page(struct domain *d,
     if ( !need_flush )
         goto out;
 
-    /* 4K mapping for PV guests never changes, 
-     * no need to flush if we trust non-present bits */
-    if ( is_hvm_domain(d) )
-        amd_iommu_flush_pages(d, gfn, 0);
+    amd_iommu_flush_pages(d, gfn, 0);
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
           merge_level <= hd->arch.paging_mode; merge_level++ )

["xsa275-4.7-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: suppress PTE merging after initial table creation

The logic is not fit for this purpose, so simply disable its use until
it can be fixed / replaced. Note that this re-enables merging for the
table creation case, which was disabled as a (perhaps unintended) side
effect of the earlier "amd/iommu: fix flush checks". It relies on no
page getting mapped more than once (with different properties) in this
process, as that would still be beyond what the merging logic can cope
with. But arch_iommu_populate_page_table() guarantees this afaict.

This is part of XSA-275.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1031,6 +1031,20 @@ int domain_unpause_by_systemcontroller(s
         prev = cmpxchg(&d->controller_pause_count, old, new);
     } while ( prev != old );
 
+    /*
+     * d->controller_pause_count is initialised to 1, and the toolstack is
+     * responsible for making one unpause hypercall when it wishes the guest
+     * to start running.
+     *
+     * All other toolstack operations should make a pair of pause/unpause
+     * calls and rely on the reference counting here.
+     *
+     * Creation is considered finished when the controller reference count
+     * first drops to 0.
+     */
+    if ( new == 0 )
+        d->creation_finished = 1;
+
     domain_unpause(d);
 
     return 0;
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -695,11 +695,24 @@ int amd_iommu_map_page(struct domain *d,
                                        !!(flags & IOMMUF_writable),
                                        !!(flags & IOMMUF_readable));
 
-    /* Do not increase pde count if io mapping has not been changed */
-    if ( !need_flush )
-        goto out;
+    if ( need_flush )
+    {
+        amd_iommu_flush_pages(d, gfn, 0);
+        /* No further merging, as the logic doesn't cope. */
+        hd->arch.no_merge = 1;
+    }
 
-    amd_iommu_flush_pages(d, gfn, 0);
+    /*
+     * Suppress merging of non-R/W mappings or after initial table creation,
+     * as the merge logic does not cope with this.
+     */
+    if ( hd->arch.no_merge || flags != (IOMMUF_writable | IOMMUF_readable) )
+        goto out;
+    if ( d->creation_finished )
+    {
+        hd->arch.no_merge = 1;
+        goto out;
+    }
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
           merge_level <= hd->arch.paging_mode; merge_level++ )
@@ -769,6 +782,10 @@ int amd_iommu_unmap_page(struct domain *
 
     /* mark PTE as 'page not present' */
     clear_iommu_pte_present(pt_mfn[1], gfn);
+
+    /* No further merging in amd_iommu_map_page(), as the logic doesn't cope. */
+    hd->arch.no_merge = 1;
+
     spin_unlock(&hd->arch.mapping_lock);
 
     amd_iommu_flush_pages(d, gfn, 0);
--- a/xen/include/asm-x86/hvm/iommu.h
+++ b/xen/include/asm-x86/hvm/iommu.h
@@ -59,6 +59,7 @@ struct arch_iommu
 
     /* amd iommu support */
     int paging_mode;
+    bool_t no_merge;
     struct page_info *root_table;
     struct guest_iommu *g_iommu;
 };
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -387,6 +387,12 @@ struct domain
     bool_t           disable_migrate;
     /* Is this guest being debugged by dom0? */
     bool_t           debugger_attached;
+    /*
+     * Set to true at the very end of domain creation, when the domain is
+     * unpaused for the first time by the systemcontroller.
+     */
+    bool_t           creation_finished;
+
     /* Which guest this guest has privileges on */
     struct domain   *target;
 

["xsa275-4.11-1.patch" (application/octet-stream)]

From: Roger Pau Monné <roger.pau@citrix.com>
Subject: amd/iommu: fix flush checks

Flush checking for AMD IOMMU didn't check whether the previous entry
was present, or whether the flags (writable/readable) changed in order
to decide whether a flush should be executed.

Fix this by taking the writable/readable/next-level fields into account,
together with the present bit.

Along these lines the flushing in amd_iommu_map_page() must not be
omitted for PV domains. The comment there was simply wrong: Mappings may
very well change, both their addresses and their permissions. Ultimately
this should honor iommu_dont_flush_iotlb, but to achieve this
amd_iommu_ops first needs to gain an .iotlb_flush hook.

Also make clear_iommu_pte_present() static, to demonstrate there's no
caller omitting the (subsequent) flush.

This is part of XSA-275.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -35,7 +35,7 @@ static unsigned int pfn_to_pde_idx(unsig
     return idx;
 }
 
-void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
+static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
 {
     u64 *table, *pte;
 
@@ -49,23 +49,42 @@ static bool_t set_iommu_pde_present(u32
                                     unsigned int next_level,
                                     bool_t iw, bool_t ir)
 {
-    u64 addr_lo, addr_hi, maddr_old, maddr_next;
+    uint64_t addr_lo, addr_hi, maddr_next;
     u32 entry;
-    bool_t need_flush = 0;
+    bool need_flush = false, old_present;
 
     maddr_next = (u64)next_mfn << PAGE_SHIFT;
 
-    addr_hi = get_field_from_reg_u32(pde[1],
-                                     IOMMU_PTE_ADDR_HIGH_MASK,
-                                     IOMMU_PTE_ADDR_HIGH_SHIFT);
-    addr_lo = get_field_from_reg_u32(pde[0],
-                                     IOMMU_PTE_ADDR_LOW_MASK,
-                                     IOMMU_PTE_ADDR_LOW_SHIFT);
-
-    maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
-
-    if ( maddr_old != maddr_next )
-        need_flush = 1;
+    old_present = get_field_from_reg_u32(pde[0], IOMMU_PTE_PRESENT_MASK,
+                                         IOMMU_PTE_PRESENT_SHIFT);
+    if ( old_present )
+    {
+        bool old_r, old_w;
+        unsigned int old_level;
+        uint64_t maddr_old;
+
+        addr_hi = get_field_from_reg_u32(pde[1],
+                                         IOMMU_PTE_ADDR_HIGH_MASK,
+                                         IOMMU_PTE_ADDR_HIGH_SHIFT);
+        addr_lo = get_field_from_reg_u32(pde[0],
+                                         IOMMU_PTE_ADDR_LOW_MASK,
+                                         IOMMU_PTE_ADDR_LOW_SHIFT);
+        old_level = get_field_from_reg_u32(pde[0],
+                                           IOMMU_PDE_NEXT_LEVEL_MASK,
+                                           IOMMU_PDE_NEXT_LEVEL_SHIFT);
+        old_w = get_field_from_reg_u32(pde[1],
+                                       IOMMU_PTE_IO_WRITE_PERMISSION_MASK,
+                                       IOMMU_PTE_IO_WRITE_PERMISSION_SHIFT);
+        old_r = get_field_from_reg_u32(pde[1],
+                                       IOMMU_PTE_IO_READ_PERMISSION_MASK,
+                                       IOMMU_PTE_IO_READ_PERMISSION_SHIFT);
+
+        maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
+
+        if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
+             old_level != next_level )
+            need_flush = true;
+    }
 
     addr_lo = maddr_next & DMA_32BIT_MASK;
     addr_hi = maddr_next >> 32;
@@ -687,10 +706,7 @@ int amd_iommu_map_page(struct domain *d,
     if ( !need_flush )
         goto out;
 
-    /* 4K mapping for PV guests never changes, 
-     * no need to flush if we trust non-present bits */
-    if ( is_hvm_domain(d) )
-        amd_iommu_flush_pages(d, gfn, 0);
+    amd_iommu_flush_pages(d, gfn, 0);
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
           merge_level <= hd->arch.paging_mode; merge_level++ )

["xsa275-4.11-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: suppress PTE merging after initial table creation

The logic is not fit for this purpose, so simply disable its use until
it can be fixed / replaced. Note that this re-enables merging for the
table creation case, which was disabled as a (perhaps unintended) side
effect of the earlier "amd/iommu: fix flush checks". It relies on no
page getting mapped more than once (with different properties) in this
process, as that would still be beyond what the merging logic can cope
with. But arch_iommu_populate_page_table() guarantees this afaict.

This is part of XSA-275.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -702,11 +702,24 @@ int amd_iommu_map_page(struct domain *d,
                                        !!(flags & IOMMUF_writable),
                                        !!(flags & IOMMUF_readable));
 
-    /* Do not increase pde count if io mapping has not been changed */
-    if ( !need_flush )
-        goto out;
+    if ( need_flush )
+    {
+        amd_iommu_flush_pages(d, gfn, 0);
+        /* No further merging, as the logic doesn't cope. */
+        hd->arch.no_merge = true;
+    }
 
-    amd_iommu_flush_pages(d, gfn, 0);
+    /*
+     * Suppress merging of non-R/W mappings or after initial table creation,
+     * as the merge logic does not cope with this.
+     */
+    if ( hd->arch.no_merge || flags != (IOMMUF_writable | IOMMUF_readable) )
+        goto out;
+    if ( d->creation_finished )
+    {
+        hd->arch.no_merge = true;
+        goto out;
+    }
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
           merge_level <= hd->arch.paging_mode; merge_level++ )
@@ -780,6 +793,10 @@ int amd_iommu_unmap_page(struct domain *
 
     /* mark PTE as 'page not present' */
     clear_iommu_pte_present(pt_mfn[1], gfn);
+
+    /* No further merging in amd_iommu_map_page(), as the logic doesn't cope. */
+    hd->arch.no_merge = true;
+
     spin_unlock(&hd->arch.mapping_lock);
 
     amd_iommu_flush_pages(d, gfn, 0);
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -40,6 +40,7 @@ struct arch_iommu
 
     /* amd iommu support */
     int paging_mode;
+    bool no_merge;
     struct page_info *root_table;
     struct guest_iommu *g_iommu;
 };


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic