[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] Xen Security Advisory 347 v2 - unsafe AMD IOMMU page table updates
From:       Xen.org security team <security () xen ! org>
Date:       2020-10-20 12:00:50
Message-ID: E1kUqJu-00024u-DS () xenbits ! xenproject ! org
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

                    Xen Security Advisory XSA-347
                              version 2

                  unsafe AMD IOMMU page table updates

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

AMD IOMMU page table entries are updated in a step by step manner,
without regard to them being potentially in use by the IOMMU.  Therefore
it was possible that the IOMMU would read and then use a half-updated
entry.  Furthermore, updates to Device Table entries lacked suitable
ordering enforcement for certain steps involved in these updates.

In both case the specific outcome heavily depends on how exactly the
compiler translated the affected pieces of code.

IMPACT
======

A malicious guest might be able to cause data corruption and data
leaks.  Host or guest Denial of Service (DoS), and privilege
escalation, cannot be ruled out.

VULNERABLE SYSTEMS
==================

All Xen versions are potentially vulnerable.

Only x86 systems with AMD, Hygon, or compatible IOMMU hardware are
vulnerable.  Arm systems as well as x86 systems with VT-d hardware or
without any IOMMUs in use are not vulnerable.

Only x86 guests which have physical devices passed through to them can
leverage the vulnerability.

MITIGATION
==========

Not passing through physical devices to untrusted guests will avoid
the vulnerability.

CREDITS
=======

This issue was discovered by Paul Durrant of Amazon and Jan Beulich of
SUSE.

RESOLUTION
==========

Applying the appropriate set of attached patches resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa347/xsa347-?.patch           xen-unstable
xsa347/xsa347-4.14-?.patch      Xen 4.14
xsa347/xsa347-4.13-?.patch      Xen 4.13
xsa347/xsa347-4.12-?.patch      Xen 4.12
xsa347/xsa347-4.11-?.patch      Xen 4.10 - 4.11

$ sha256sum xsa347* xsa347*/*
f16e1a348b0e45601c96b2bd08afc4202bbccc92c8af8344b3c8286ca819acef  xsa347.meta
82e14d0507ec94f8cfac2b4d5d1b60681b925218ab927332bee338e6b6c679c9  xsa347/xsa347-1.patch
1bc6018c3685727ba4035bf0b5cea95940a1b9c4746fa9bddfd41507482d68a1  xsa347/xsa347-2.patch
f1bd8eba268300f564837ac37fe43b774ace885c9cbf8fcacae457128730bc80  xsa347/xsa347-3.patch
5aec8f3b15aa799e1ff7ec0dfe53523cb91aa5fd88033f7f034cb74ebaa6abe4  xsa347/xsa347-4.11-1.patch
4ab3a6fa181ce486b4c9943f6629b7c1a4337c7ccb92701ae6e40108533778ca  xsa347/xsa347-4.11-2.patch
fec82340dc65fc1001358de51d0639b2b401818fa1e831f8715cb1780b17dc7b  xsa347/xsa347-4.12-1.patch
be89e976fe03464ce3a73b162c07927128f41a8a03466e903ebfa4ea0dc46116  xsa347/xsa347-4.12-2.patch
5dc0abf73d1a9d21f2b57e6c57ee5c15cc3febbb783123c0946f3e5778671929  xsa347/xsa347-4.13-1.patch
6d2b6ea7a373fb1c4cce63db349bbafa8603b5e7c6b74fc6d029954075f2268d  xsa347/xsa347-4.13-2.patch
4e154bfca5101569c8260e307eb6439783bc99547b7dfb5aba2bafebbde46190  xsa347/xsa347-4.13-3.patch
6a70c2afba0d3ad73b12743a6808ba8002e9ee573d7c460397355e40de3b553f  xsa347/xsa347-4.14-1.patch
1bc6018c3685727ba4035bf0b5cea95940a1b9c4746fa9bddfd41507482d68a1  xsa347/xsa347-4.14-2.patch
f1bd8eba268300f564837ac37fe43b774ace885c9cbf8fcacae457128730bc80  xsa347/xsa347-4.14-3.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.

This is because removal of pass-through devices or their replacement by
emulated devices is a guest visible configuration change, which may lead
to re-discovery of the issue.

Deployment of this mitigation is permitted only AFTER the embargo ends.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAl+Ozq4MHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZ49QH/jCHl5dId4Bj/FSfDEK1iTTpTMTIwIIv5PGaSBh1
F/VoQiS+e5hAlbucK8M362GJlHO3p4wHPgyZLNY82BZrPuzeL/GAV8p4qqrfsJjS
uk6UZQyyIAKH8NcbICzm06WrOQ2ayfGJvJtmyfkwqDcT+VSJ/ohmcBw9WQABACSS
+Wr2LRZAzucpY23z/sWMOYx312sRx8EvzAeA4qP0g1jAc54cNbCVuDV2iqcFot4F
vd+/vfkh7HuIvLk7gQ8KbKjXyGqR7Wt78EEDNTpSxvXuGUTFc+jCnA0429Se1NGw
cLgeaTr29RtDIlFtxqS0DR3Pu4HYL535Dkn/w8dfmL7mPjU=
=rjzR
-----END PGP SIGNATURE-----

["xsa347.meta" (application/octet-stream)]
["xsa347/xsa347-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: convert amd_iommu_pte from struct to union

This is to add a "raw" counterpart to the bitfield equivalent. Take the
opportunity and
 - convert fields to bool / unsigned int,
 - drop the naming of the reserved field,
 - shorten the names of the ignored ones.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu-defs.h
+++ b/xen/drivers/passthrough/amd/iommu-defs.h
@@ -451,20 +451,23 @@ union amd_iommu_x2apic_control {
 #define IOMMU_PAGE_TABLE_U32_PER_ENTRY	(IOMMU_PAGE_TABLE_ENTRY_SIZE / 4)
 #define IOMMU_PAGE_TABLE_ALIGNMENT	4096
 
-struct amd_iommu_pte {
-    uint64_t pr:1;
-    uint64_t ignored0:4;
-    uint64_t a:1;
-    uint64_t d:1;
-    uint64_t ignored1:2;
-    uint64_t next_level:3;
-    uint64_t mfn:40;
-    uint64_t reserved:7;
-    uint64_t u:1;
-    uint64_t fc:1;
-    uint64_t ir:1;
-    uint64_t iw:1;
-    uint64_t ignored2:1;
+union amd_iommu_pte {
+    uint64_t raw;
+    struct {
+        bool pr:1;
+        unsigned int ign0:4;
+        bool a:1;
+        bool d:1;
+        unsigned int ign1:2;
+        unsigned int next_level:3;
+        uint64_t mfn:40;
+        unsigned int :7;
+        bool u:1;
+        bool fc:1;
+        bool ir:1;
+        bool iw:1;
+        unsigned int ign2:1;
+    };
 };
 
 /* Paging modes */
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -34,7 +34,7 @@ static unsigned int pfn_to_pde_idx(unsig
 static unsigned int clear_iommu_pte_present(unsigned long l1_mfn,
                                             unsigned long dfn)
 {
-    struct amd_iommu_pte *table, *pte;
+    union amd_iommu_pte *table, *pte;
     unsigned int flush_flags;
 
     table = map_domain_page(_mfn(l1_mfn));
@@ -48,7 +48,7 @@ static unsigned int clear_iommu_pte_pres
     return flush_flags;
 }
 
-static unsigned int set_iommu_pde_present(struct amd_iommu_pte *pte,
+static unsigned int set_iommu_pde_present(union amd_iommu_pte *pte,
                                           unsigned long next_mfn,
                                           unsigned int next_level, bool iw,
                                           bool ir)
@@ -83,7 +83,7 @@ static unsigned int set_iommu_pte_presen
                                           int pde_level,
                                           bool iw, bool ir)
 {
-    struct amd_iommu_pte *table, *pde;
+    union amd_iommu_pte *table, *pde;
     unsigned int flush_flags;
 
     table = map_domain_page(_mfn(pt_mfn));
@@ -174,7 +174,7 @@ void iommu_dte_set_guest_cr3(struct amd_
 static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
                               unsigned long pt_mfn[], bool map)
 {
-    struct amd_iommu_pte *pde, *next_table_vaddr;
+    union amd_iommu_pte *pde, *next_table_vaddr;
     unsigned long  next_table_mfn;
     unsigned int level;
     struct page_info *table;
@@ -448,7 +448,7 @@ int __init amd_iommu_quarantine_init(str
     unsigned long end_gfn =
         1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
     unsigned int level = amd_iommu_get_paging_mode(end_gfn);
-    struct amd_iommu_pte *table;
+    union amd_iommu_pte *table;
 
     if ( hd->arch.amd.root_table )
     {
@@ -479,7 +479,7 @@ int __init amd_iommu_quarantine_init(str
 
         for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
         {
-            struct amd_iommu_pte *pde = &table[i];
+            union amd_iommu_pte *pde = &table[i];
 
             /*
              * PDEs are essentially a subset of PTEs, so this function
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -495,7 +495,7 @@ static void amd_dump_p2m_table_level(str
                                      paddr_t gpa, int indent)
 {
     paddr_t address;
-    struct amd_iommu_pte *table_vaddr;
+    const union amd_iommu_pte *table_vaddr;
     int index;
 
     if ( level < 1 )
@@ -511,7 +511,7 @@ static void amd_dump_p2m_table_level(str
 
     for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
     {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
+        const union amd_iommu_pte *pde = &table_vaddr[index];
 
         if ( !(index % 2) )
             process_pending_softirqs();

["xsa347/xsa347-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: update live PTEs atomically

Updating a live PTE bitfield by bitfield risks the compiler re-ordering
the individual updates as well as splitting individual updates into
multiple memory writes. Construct the new entry fully in a local
variable, do the check to determine the flushing needs on the thus
established new entry, and then write the new entry by a single insn.

Similarly using memset() to clear a PTE is unsafe, as the order of
writes the function does is, at least in principle, undefined.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -41,7 +41,7 @@ static unsigned int clear_iommu_pte_pres
     pte = &table[pfn_to_pde_idx(dfn, 1)];
 
     flush_flags = pte->pr ? IOMMU_FLUSHF_modified : 0;
-    memset(pte, 0, sizeof(*pte));
+    write_atomic(&pte->raw, 0);
 
     unmap_domain_page(table);
 
@@ -53,26 +53,30 @@ static unsigned int set_iommu_pde_presen
                                           unsigned int next_level, bool iw,
                                           bool ir)
 {
+    union amd_iommu_pte new = {}, old;
     unsigned int flush_flags = IOMMU_FLUSHF_added;
 
-    if ( pte->pr &&
-         (pte->mfn != next_mfn ||
-          pte->iw != iw ||
-          pte->ir != ir ||
-          pte->next_level != next_level) )
-            flush_flags |= IOMMU_FLUSHF_modified;
-
     /*
      * FC bit should be enabled in PTE, this helps to solve potential
      * issues with ATS devices
      */
-    pte->fc = !next_level;
+    new.fc = !next_level;
+
+    new.mfn = next_mfn;
+    new.iw = iw;
+    new.ir = ir;
+    new.next_level = next_level;
+    new.pr = true;
+
+    old.raw = read_atomic(&pte->raw);
+    old.ign0 = 0;
+    old.ign1 = 0;
+    old.ign2 = 0;
+
+    if ( old.pr && old.raw != new.raw )
+        flush_flags |= IOMMU_FLUSHF_modified;
 
-    pte->mfn = next_mfn;
-    pte->iw = iw;
-    pte->ir = ir;
-    pte->next_level = next_level;
-    pte->pr = 1;
+    write_atomic(&pte->raw, new.raw);
 
     return flush_flags;
 }

["xsa347/xsa347-3.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications

DMA and interrupt translation should be enabled only after other
applicable DTE fields have been written. Similarly when disabling
translation or when moving a device between domains, translation should
first be disabled, before other entry fields get modified. Note however
that the "moving" aspect doesn't apply to the interrupt remapping side,
as domain specifics are maintained in the IRTEs here, not the DTE. We
also never disable interrupt remapping once it got enabled for a device
(the respective argument passed is always the immutable iommu_intremap).

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -103,11 +103,18 @@ void amd_iommu_set_root_page_table(struc
                                    uint64_t root_ptr, uint16_t domain_id,
                                    uint8_t paging_mode, bool valid)
 {
+    if ( valid || dte->v )
+    {
+        dte->tv = false;
+        dte->v = true;
+        smp_wmb();
+    }
     dte->domain_id = domain_id;
     dte->pt_root = paddr_to_pfn(root_ptr);
     dte->iw = true;
     dte->ir = true;
     dte->paging_mode = paging_mode;
+    smp_wmb();
     dte->tv = true;
     dte->v = valid;
 }
@@ -130,6 +137,7 @@ void amd_iommu_set_intremap_table(
     }
 
     dte->ig = false; /* unmapped interrupts result in i/o page faults */
+    smp_wmb();
     dte->iv = valid;
 }
 
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -117,7 +117,10 @@ static void amd_iommu_setup_domain_devic
         /* Undo what amd_iommu_disable_domain_device() may have done. */
         ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
         if ( dte->it_root )
+        {
             dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+            smp_wmb();
+        }
         dte->iv = iommu_intremap;
         dte->ex = ivrs_dev->dte_allow_exclusion;
         dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);

["xsa347/xsa347-4.11-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: update live PTEs atomically

Updating a live PTE word by word allows the IOMMU to see a partially
updated entry. Construct the new entry fully in a local variable and
then write the new entry by a single insn.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -41,7 +41,7 @@ static void clear_iommu_pte_present(unsi
 
     table = map_domain_page(_mfn(l1_mfn));
     pte = table + pfn_to_pde_idx(gfn, IOMMU_PAGING_MODE_LEVEL_1);
-    *pte = 0;
+    write_atomic(pte, 0);
     unmap_domain_page(table);
 }
 
@@ -49,7 +49,7 @@ static bool_t set_iommu_pde_present(u32
                                     unsigned int next_level,
                                     bool_t iw, bool_t ir)
 {
-    uint64_t addr_lo, addr_hi, maddr_next;
+    uint64_t addr_lo, addr_hi, maddr_next, full;
     u32 entry;
     bool need_flush = false, old_present;
 
@@ -106,7 +106,7 @@ static bool_t set_iommu_pde_present(u32
     if ( next_level == IOMMU_PAGING_MODE_LEVEL_0 )
         set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
                              IOMMU_PTE_FC_MASK, IOMMU_PTE_FC_SHIFT, &entry);
-    pde[1] = entry;
+    full = (uint64_t)entry << 32;
 
     /* mark next level as 'present' */
     set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, 0,
@@ -118,7 +118,9 @@ static bool_t set_iommu_pde_present(u32
     set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
                          IOMMU_PDE_PRESENT_MASK,
                          IOMMU_PDE_PRESENT_SHIFT, &entry);
-    pde[0] = entry;
+    full |= entry;
+
+    write_atomic((uint64_t *)pde, full);
 
     return need_flush;
 }

["xsa347/xsa347-4.11-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications

DMA and interrupt translation should be enabled only after other
applicable DTE fields have been written. Similarly when disabling
translation or when moving a device between domains, translation should
first be disabled, before other entry fields get modified. Note however
that the "moving" aspect doesn't apply to the interrupt remapping side,
as domain specifics are maintained in the IRTEs here, not the DTE. We
also never disable interrupt remapping once it got enabled for a device
(the respective argument passed is always the immutable iommu_intremap).

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -147,7 +147,22 @@ void amd_iommu_set_root_page_table(
     u32 *dte, u64 root_ptr, u16 domain_id, u8 paging_mode, u8 valid)
 {
     u64 addr_hi, addr_lo;
-    u32 entry;
+    u32 entry, dte0 = dte[0];
+
+    if ( valid ||
+         get_field_from_reg_u32(dte0, IOMMU_DEV_TABLE_VALID_MASK,
+                                IOMMU_DEV_TABLE_VALID_SHIFT) )
+    {
+        set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, dte0,
+                             IOMMU_DEV_TABLE_TRANSLATION_VALID_MASK,
+                             IOMMU_DEV_TABLE_TRANSLATION_VALID_SHIFT, &dte0);
+        set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, dte0,
+                             IOMMU_DEV_TABLE_VALID_MASK,
+                             IOMMU_DEV_TABLE_VALID_SHIFT, &dte0);
+        dte[0] = dte0;
+        smp_wmb();
+    }
+
     set_field_in_reg_u32(domain_id, 0,
                          IOMMU_DEV_TABLE_DOMAIN_ID_MASK,
                          IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT, &entry);
@@ -166,8 +181,9 @@ void amd_iommu_set_root_page_table(
                          IOMMU_DEV_TABLE_IO_READ_PERMISSION_MASK,
                          IOMMU_DEV_TABLE_IO_READ_PERMISSION_SHIFT, &entry);
     dte[1] = entry;
+    smp_wmb();
 
-    set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, 0,
+    set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, dte0,
                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_MASK,
                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_SHIFT, &entry);
     set_field_in_reg_u32(paging_mode, entry,
@@ -180,7 +196,7 @@ void amd_iommu_set_root_page_table(
                          IOMMU_CONTROL_DISABLED, entry,
                          IOMMU_DEV_TABLE_VALID_MASK,
                          IOMMU_DEV_TABLE_VALID_SHIFT, &entry);
-    dte[0] = entry;
+    write_atomic(&dte[0], entry);
 }
 
 void iommu_dte_set_iotlb(u32 *dte, u8 i)
@@ -212,6 +228,7 @@ void __init amd_iommu_set_intremap_table
                         IOMMU_DEV_TABLE_INT_CONTROL_MASK,
                         IOMMU_DEV_TABLE_INT_CONTROL_SHIFT, &entry);
     dte[5] = entry;
+    smp_wmb();
 
     set_field_in_reg_u32((u32)addr_lo >> 6, 0,
                         IOMMU_DEV_TABLE_INT_TABLE_PTR_LOW_MASK,
@@ -229,7 +246,7 @@ void __init amd_iommu_set_intremap_table
                          IOMMU_CONTROL_DISABLED, entry,
                          IOMMU_DEV_TABLE_INT_VALID_MASK,
                          IOMMU_DEV_TABLE_INT_VALID_SHIFT, &entry);
-    dte[4] = entry;
+    write_atomic(&dte[4], entry);
 }
 
 void __init iommu_dte_add_device_entry(u32 *dte, struct ivrs_mappings *ivrs_dev)

["xsa347/xsa347-4.12-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: update live PTEs atomically

Updating a live PTE word by word allows the IOMMU to see a partially
updated entry. Construct the new entry fully in a local variable and
then write the new entry by a single insn.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -49,7 +49,7 @@ static unsigned int clear_iommu_pte_pres
                                          IOMMU_PTE_PRESENT_SHIFT) ?
                                          IOMMU_FLUSHF_modified : 0;
 
-    *pte = 0;
+    write_atomic(pte, 0);
     unmap_domain_page(table);
 
     return flush_flags;
@@ -60,7 +60,7 @@ static unsigned int set_iommu_pde_presen
                                           unsigned int next_level, bool iw,
                                           bool ir)
 {
-    uint64_t maddr_next;
+    uint64_t maddr_next, full;
     uint32_t addr_lo, addr_hi, entry;
     bool old_present;
     unsigned int flush_flags = IOMMU_FLUSHF_added;
@@ -119,7 +119,7 @@ static unsigned int set_iommu_pde_presen
     if ( next_level == 0 )
         set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
                              IOMMU_PTE_FC_MASK, IOMMU_PTE_FC_SHIFT, &entry);
-    pde[1] = entry;
+    full = (uint64_t)entry << 32;
 
     /* mark next level as 'present' */
     set_field_in_reg_u32(addr_lo >> PAGE_SHIFT, 0,
@@ -131,7 +131,9 @@ static unsigned int set_iommu_pde_presen
     set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, entry,
                          IOMMU_PDE_PRESENT_MASK,
                          IOMMU_PDE_PRESENT_SHIFT, &entry);
-    pde[0] = entry;
+    full |= entry;
+
+    write_atomic((uint64_t *)pde, full);
 
     return flush_flags;
 }

["xsa347/xsa347-4.12-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications

DMA and interrupt translation should be enabled only after other
applicable DTE fields have been written. Similarly when disabling
translation or when moving a device between domains, translation should
first be disabled, before other entry fields get modified. Note however
that the "moving" aspect doesn't apply to the interrupt remapping side,
as domain specifics are maintained in the IRTEs here, not the DTE. We
also never disable interrupt remapping once it got enabled for a device
(the respective argument passed is always the immutable iommu_intremap).

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -162,7 +162,22 @@ void amd_iommu_set_root_page_table(uint3
                                    uint16_t domain_id, uint8_t paging_mode,
                                    uint8_t valid)
 {
-    uint32_t addr_hi, addr_lo, entry;
+    uint32_t addr_hi, addr_lo, entry, dte0 = dte[0];
+
+    if ( valid ||
+         get_field_from_reg_u32(dte0, IOMMU_DEV_TABLE_VALID_MASK,
+                                IOMMU_DEV_TABLE_VALID_SHIFT) )
+    {
+        set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, dte0,
+                             IOMMU_DEV_TABLE_TRANSLATION_VALID_MASK,
+                             IOMMU_DEV_TABLE_TRANSLATION_VALID_SHIFT, &dte0);
+        set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, dte0,
+                             IOMMU_DEV_TABLE_VALID_MASK,
+                             IOMMU_DEV_TABLE_VALID_SHIFT, &dte0);
+        dte[0] = dte0;
+        smp_wmb();
+    }
+
     set_field_in_reg_u32(domain_id, 0,
                          IOMMU_DEV_TABLE_DOMAIN_ID_MASK,
                          IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT, &entry);
@@ -181,8 +196,9 @@ void amd_iommu_set_root_page_table(uint3
                          IOMMU_DEV_TABLE_IO_READ_PERMISSION_MASK,
                          IOMMU_DEV_TABLE_IO_READ_PERMISSION_SHIFT, &entry);
     dte[1] = entry;
+    smp_wmb();
 
-    set_field_in_reg_u32(addr_lo >> PAGE_SHIFT, 0,
+    set_field_in_reg_u32(addr_lo >> PAGE_SHIFT, dte0,
                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_MASK,
                          IOMMU_DEV_TABLE_PAGE_TABLE_PTR_LOW_SHIFT, &entry);
     set_field_in_reg_u32(paging_mode, entry,
@@ -195,7 +211,7 @@ void amd_iommu_set_root_page_table(uint3
                          IOMMU_CONTROL_DISABLED, entry,
                          IOMMU_DEV_TABLE_VALID_MASK,
                          IOMMU_DEV_TABLE_VALID_SHIFT, &entry);
-    dte[0] = entry;
+    write_atomic(&dte[0], entry);
 }
 
 void iommu_dte_set_iotlb(uint32_t *dte, uint8_t i)
@@ -226,6 +242,7 @@ void __init amd_iommu_set_intremap_table
                          IOMMU_DEV_TABLE_INT_CONTROL_MASK,
                          IOMMU_DEV_TABLE_INT_CONTROL_SHIFT, &entry);
     dte[5] = entry;
+    smp_wmb();
 
     set_field_in_reg_u32(addr_lo >> 6, 0,
                          IOMMU_DEV_TABLE_INT_TABLE_PTR_LOW_MASK,
@@ -243,7 +260,7 @@ void __init amd_iommu_set_intremap_table
                          IOMMU_CONTROL_DISABLED, entry,
                          IOMMU_DEV_TABLE_INT_VALID_MASK,
                          IOMMU_DEV_TABLE_INT_VALID_SHIFT, &entry);
-    dte[4] = entry;
+    write_atomic(&dte[4], entry);
 }
 
 void __init iommu_dte_add_device_entry(uint32_t *dte,

["xsa347/xsa347-4.13-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: convert amd_iommu_pte from struct to union

This is to add a "raw" counterpart to the bitfield equivalent. Take the
opportunity and
 - convert fields to bool / unsigned int,
 - drop the naming of the reserved field,
 - shorten the names of the ignored ones.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -38,7 +38,7 @@ static unsigned int pfn_to_pde_idx(unsig
 static unsigned int clear_iommu_pte_present(unsigned long l1_mfn,
                                             unsigned long dfn)
 {
-    struct amd_iommu_pte *table, *pte;
+    union amd_iommu_pte *table, *pte;
     unsigned int flush_flags;
 
     table = map_domain_page(_mfn(l1_mfn));
@@ -52,7 +52,7 @@ static unsigned int clear_iommu_pte_pres
     return flush_flags;
 }
 
-static unsigned int set_iommu_pde_present(struct amd_iommu_pte *pte,
+static unsigned int set_iommu_pde_present(union amd_iommu_pte *pte,
                                           unsigned long next_mfn,
                                           unsigned int next_level, bool iw,
                                           bool ir)
@@ -87,7 +87,7 @@ static unsigned int set_iommu_pte_presen
                                           int pde_level,
                                           bool iw, bool ir)
 {
-    struct amd_iommu_pte *table, *pde;
+    union amd_iommu_pte *table, *pde;
     unsigned int flush_flags;
 
     table = map_domain_page(_mfn(pt_mfn));
@@ -178,7 +178,7 @@ void iommu_dte_set_guest_cr3(struct amd_
 static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
                               unsigned long pt_mfn[], bool map)
 {
-    struct amd_iommu_pte *pde, *next_table_vaddr;
+    union amd_iommu_pte *pde, *next_table_vaddr;
     unsigned long  next_table_mfn;
     unsigned int level;
     struct page_info *table;
@@ -458,7 +458,7 @@ int __init amd_iommu_quarantine_init(str
     unsigned long end_gfn =
         1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
     unsigned int level = amd_iommu_get_paging_mode(end_gfn);
-    struct amd_iommu_pte *table;
+    union amd_iommu_pte *table;
 
     if ( hd->arch.root_table )
     {
@@ -489,7 +489,7 @@ int __init amd_iommu_quarantine_init(str
 
         for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
         {
-            struct amd_iommu_pte *pde = &table[i];
+            union amd_iommu_pte *pde = &table[i];
 
             /*
              * PDEs are essentially a subset of PTEs, so this function
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -390,7 +390,7 @@ static void deallocate_next_page_table(s
 
 static void deallocate_page_table(struct page_info *pg)
 {
-    struct amd_iommu_pte *table_vaddr;
+    union amd_iommu_pte *table_vaddr;
     unsigned int index, level = PFN_ORDER(pg);
 
     PFN_ORDER(pg) = 0;
@@ -405,7 +405,7 @@ static void deallocate_page_table(struct
 
     for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
     {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
+        union amd_iommu_pte *pde = &table_vaddr[index];
 
         if ( pde->mfn && pde->next_level && pde->pr )
         {
@@ -557,7 +557,7 @@ static void amd_dump_p2m_table_level(str
                                      paddr_t gpa, int indent)
 {
     paddr_t address;
-    struct amd_iommu_pte *table_vaddr;
+    const union amd_iommu_pte *table_vaddr;
     int index;
 
     if ( level < 1 )
@@ -573,7 +573,7 @@ static void amd_dump_p2m_table_level(str
 
     for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
     {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
+        const union amd_iommu_pte *pde = &table_vaddr[index];
 
         if ( !(index % 2) )
             process_pending_softirqs();
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -465,20 +465,23 @@ union amd_iommu_x2apic_control {
 #define IOMMU_PAGE_TABLE_U32_PER_ENTRY	(IOMMU_PAGE_TABLE_ENTRY_SIZE / 4)
 #define IOMMU_PAGE_TABLE_ALIGNMENT	4096
 
-struct amd_iommu_pte {
-    uint64_t pr:1;
-    uint64_t ignored0:4;
-    uint64_t a:1;
-    uint64_t d:1;
-    uint64_t ignored1:2;
-    uint64_t next_level:3;
-    uint64_t mfn:40;
-    uint64_t reserved:7;
-    uint64_t u:1;
-    uint64_t fc:1;
-    uint64_t ir:1;
-    uint64_t iw:1;
-    uint64_t ignored2:1;
+union amd_iommu_pte {
+    uint64_t raw;
+    struct {
+        bool pr:1;
+        unsigned int ign0:4;
+        bool a:1;
+        bool d:1;
+        unsigned int ign1:2;
+        unsigned int next_level:3;
+        uint64_t mfn:40;
+        unsigned int :7;
+        bool u:1;
+        bool fc:1;
+        bool ir:1;
+        bool iw:1;
+        unsigned int ign2:1;
+    };
 };
 
 /* Paging modes */

["xsa347/xsa347-4.13-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: update live PTEs atomically

Updating a live PTE bitfield by bitfield risks the compiler re-ordering
the individual updates as well as splitting individual updates into
multiple memory writes. Construct the new entry fully in a local
variable, do the check to determine the flushing needs on the thus
established new entry, and then write the new entry by a single insn.

Similarly using memset() to clear a PTE is unsafe, as the order of
writes the function does is, at least in principle, undefined.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -45,7 +45,7 @@ static unsigned int clear_iommu_pte_pres
     pte = &table[pfn_to_pde_idx(dfn, 1)];
 
     flush_flags = pte->pr ? IOMMU_FLUSHF_modified : 0;
-    memset(pte, 0, sizeof(*pte));
+    write_atomic(&pte->raw, 0);
 
     unmap_domain_page(table);
 
@@ -57,26 +57,30 @@ static unsigned int set_iommu_pde_presen
                                           unsigned int next_level, bool iw,
                                           bool ir)
 {
+    union amd_iommu_pte new = {}, old;
     unsigned int flush_flags = IOMMU_FLUSHF_added;
 
-    if ( pte->pr &&
-         (pte->mfn != next_mfn ||
-          pte->iw != iw ||
-          pte->ir != ir ||
-          pte->next_level != next_level) )
-            flush_flags |= IOMMU_FLUSHF_modified;
-
     /*
      * FC bit should be enabled in PTE, this helps to solve potential
      * issues with ATS devices
      */
-    pte->fc = !next_level;
+    new.fc = !next_level;
+
+    new.mfn = next_mfn;
+    new.iw = iw;
+    new.ir = ir;
+    new.next_level = next_level;
+    new.pr = true;
+
+    old.raw = read_atomic(&pte->raw);
+    old.ign0 = 0;
+    old.ign1 = 0;
+    old.ign2 = 0;
+
+    if ( old.pr && old.raw != new.raw )
+        flush_flags |= IOMMU_FLUSHF_modified;
 
-    pte->mfn = next_mfn;
-    pte->iw = iw;
-    pte->ir = ir;
-    pte->next_level = next_level;
-    pte->pr = 1;
+    write_atomic(&pte->raw, new.raw);
 
     return flush_flags;
 }

["xsa347/xsa347-4.13-3.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications

DMA and interrupt translation should be enabled only after other
applicable DTE fields have been written. Similarly when disabling
translation or when moving a device between domains, translation should
first be disabled, before other entry fields get modified. Note however
that the "moving" aspect doesn't apply to the interrupt remapping side,
as domain specifics are maintained in the IRTEs here, not the DTE. We
also never disable interrupt remapping once it got enabled for a device
(the respective argument passed is always the immutable iommu_intremap).

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -107,11 +107,18 @@ void amd_iommu_set_root_page_table(struc
                                    uint64_t root_ptr, uint16_t domain_id,
                                    uint8_t paging_mode, bool valid)
 {
+    if ( valid || dte->v )
+    {
+        dte->tv = false;
+        dte->v = true;
+        smp_wmb();
+    }
     dte->domain_id = domain_id;
     dte->pt_root = paddr_to_pfn(root_ptr);
     dte->iw = true;
     dte->ir = true;
     dte->paging_mode = paging_mode;
+    smp_wmb();
     dte->tv = true;
     dte->v = valid;
 }
@@ -134,6 +141,7 @@ void amd_iommu_set_intremap_table(
     }
 
     dte->ig = false; /* unmapped interrupts result in i/o page faults */
+    smp_wmb();
     dte->iv = valid;
 }
 
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -120,7 +120,10 @@ static void amd_iommu_setup_domain_devic
         /* Undo what amd_iommu_disable_domain_device() may have done. */
         ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
         if ( dte->it_root )
+        {
             dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+            smp_wmb();
+        }
         dte->iv = iommu_intremap;
         dte->ex = ivrs_dev->dte_allow_exclusion;
         dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);

["xsa347/xsa347-4.14-1.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: convert amd_iommu_pte from struct to union

This is to add a "raw" counterpart to the bitfield equivalent. Take the
opportunity and
 - convert fields to bool / unsigned int,
 - drop the naming of the reserved field,
 - shorten the names of the ignored ones.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu-defs.h
+++ b/xen/drivers/passthrough/amd/iommu-defs.h
@@ -451,20 +451,23 @@ union amd_iommu_x2apic_control {
 #define IOMMU_PAGE_TABLE_U32_PER_ENTRY	(IOMMU_PAGE_TABLE_ENTRY_SIZE / 4)
 #define IOMMU_PAGE_TABLE_ALIGNMENT	4096
 
-struct amd_iommu_pte {
-    uint64_t pr:1;
-    uint64_t ignored0:4;
-    uint64_t a:1;
-    uint64_t d:1;
-    uint64_t ignored1:2;
-    uint64_t next_level:3;
-    uint64_t mfn:40;
-    uint64_t reserved:7;
-    uint64_t u:1;
-    uint64_t fc:1;
-    uint64_t ir:1;
-    uint64_t iw:1;
-    uint64_t ignored2:1;
+union amd_iommu_pte {
+    uint64_t raw;
+    struct {
+        bool pr:1;
+        unsigned int ign0:4;
+        bool a:1;
+        bool d:1;
+        unsigned int ign1:2;
+        unsigned int next_level:3;
+        uint64_t mfn:40;
+        unsigned int :7;
+        bool u:1;
+        bool fc:1;
+        bool ir:1;
+        bool iw:1;
+        unsigned int ign2:1;
+    };
 };
 
 /* Paging modes */
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -34,7 +34,7 @@ static unsigned int pfn_to_pde_idx(unsig
 static unsigned int clear_iommu_pte_present(unsigned long l1_mfn,
                                             unsigned long dfn)
 {
-    struct amd_iommu_pte *table, *pte;
+    union amd_iommu_pte *table, *pte;
     unsigned int flush_flags;
 
     table = map_domain_page(_mfn(l1_mfn));
@@ -48,7 +48,7 @@ static unsigned int clear_iommu_pte_pres
     return flush_flags;
 }
 
-static unsigned int set_iommu_pde_present(struct amd_iommu_pte *pte,
+static unsigned int set_iommu_pde_present(union amd_iommu_pte *pte,
                                           unsigned long next_mfn,
                                           unsigned int next_level, bool iw,
                                           bool ir)
@@ -83,7 +83,7 @@ static unsigned int set_iommu_pte_presen
                                           int pde_level,
                                           bool iw, bool ir)
 {
-    struct amd_iommu_pte *table, *pde;
+    union amd_iommu_pte *table, *pde;
     unsigned int flush_flags;
 
     table = map_domain_page(_mfn(pt_mfn));
@@ -174,7 +174,7 @@ void iommu_dte_set_guest_cr3(struct amd_
 static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
                               unsigned long pt_mfn[], bool map)
 {
-    struct amd_iommu_pte *pde, *next_table_vaddr;
+    union amd_iommu_pte *pde, *next_table_vaddr;
     unsigned long  next_table_mfn;
     unsigned int level;
     struct page_info *table;
@@ -448,7 +448,7 @@ int __init amd_iommu_quarantine_init(str
     unsigned long end_gfn =
         1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
     unsigned int level = amd_iommu_get_paging_mode(end_gfn);
-    struct amd_iommu_pte *table;
+    union amd_iommu_pte *table;
 
     if ( hd->arch.root_table )
     {
@@ -479,7 +479,7 @@ int __init amd_iommu_quarantine_init(str
 
         for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
         {
-            struct amd_iommu_pte *pde = &table[i];
+            union amd_iommu_pte *pde = &table[i];
 
             /*
              * PDEs are essentially a subset of PTEs, so this function
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -387,7 +387,7 @@ static void deallocate_next_page_table(s
 
 static void deallocate_page_table(struct page_info *pg)
 {
-    struct amd_iommu_pte *table_vaddr;
+    union amd_iommu_pte *table_vaddr;
     unsigned int index, level = PFN_ORDER(pg);
 
     PFN_ORDER(pg) = 0;
@@ -402,7 +402,7 @@ static void deallocate_page_table(struct
 
     for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
     {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
+        union amd_iommu_pte *pde = &table_vaddr[index];
 
         if ( pde->mfn && pde->next_level && pde->pr )
         {
@@ -554,7 +554,7 @@ static void amd_dump_p2m_table_level(str
                                      paddr_t gpa, int indent)
 {
     paddr_t address;
-    struct amd_iommu_pte *table_vaddr;
+    const union amd_iommu_pte *table_vaddr;
     int index;
 
     if ( level < 1 )
@@ -570,7 +570,7 @@ static void amd_dump_p2m_table_level(str
 
     for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
     {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
+        const union amd_iommu_pte *pde = &table_vaddr[index];
 
         if ( !(index % 2) )
             process_pending_softirqs();

["xsa347/xsa347-4.14-2.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: update live PTEs atomically

Updating a live PTE bitfield by bitfield risks the compiler re-ordering
the individual updates as well as splitting individual updates into
multiple memory writes. Construct the new entry fully in a local
variable, do the check to determine the flushing needs on the thus
established new entry, and then write the new entry by a single insn.

Similarly using memset() to clear a PTE is unsafe, as the order of
writes the function does is, at least in principle, undefined.

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -41,7 +41,7 @@ static unsigned int clear_iommu_pte_pres
     pte = &table[pfn_to_pde_idx(dfn, 1)];
 
     flush_flags = pte->pr ? IOMMU_FLUSHF_modified : 0;
-    memset(pte, 0, sizeof(*pte));
+    write_atomic(&pte->raw, 0);
 
     unmap_domain_page(table);
 
@@ -53,26 +53,30 @@ static unsigned int set_iommu_pde_presen
                                           unsigned int next_level, bool iw,
                                           bool ir)
 {
+    union amd_iommu_pte new = {}, old;
     unsigned int flush_flags = IOMMU_FLUSHF_added;
 
-    if ( pte->pr &&
-         (pte->mfn != next_mfn ||
-          pte->iw != iw ||
-          pte->ir != ir ||
-          pte->next_level != next_level) )
-            flush_flags |= IOMMU_FLUSHF_modified;
-
     /*
      * FC bit should be enabled in PTE, this helps to solve potential
      * issues with ATS devices
      */
-    pte->fc = !next_level;
+    new.fc = !next_level;
+
+    new.mfn = next_mfn;
+    new.iw = iw;
+    new.ir = ir;
+    new.next_level = next_level;
+    new.pr = true;
+
+    old.raw = read_atomic(&pte->raw);
+    old.ign0 = 0;
+    old.ign1 = 0;
+    old.ign2 = 0;
+
+    if ( old.pr && old.raw != new.raw )
+        flush_flags |= IOMMU_FLUSHF_modified;
 
-    pte->mfn = next_mfn;
-    pte->iw = iw;
-    pte->ir = ir;
-    pte->next_level = next_level;
-    pte->pr = 1;
+    write_atomic(&pte->raw, new.raw);
 
     return flush_flags;
 }

["xsa347/xsa347-4.14-3.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: AMD/IOMMU: ensure suitable ordering of DTE modifications

DMA and interrupt translation should be enabled only after other
applicable DTE fields have been written. Similarly when disabling
translation or when moving a device between domains, translation should
first be disabled, before other entry fields get modified. Note however
that the "moving" aspect doesn't apply to the interrupt remapping side,
as domain specifics are maintained in the IRTEs here, not the DTE. We
also never disable interrupt remapping once it got enabled for a device
(the respective argument passed is always the immutable iommu_intremap).

This is part of XSA-347.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -103,11 +103,18 @@ void amd_iommu_set_root_page_table(struc
                                    uint64_t root_ptr, uint16_t domain_id,
                                    uint8_t paging_mode, bool valid)
 {
+    if ( valid || dte->v )
+    {
+        dte->tv = false;
+        dte->v = true;
+        smp_wmb();
+    }
     dte->domain_id = domain_id;
     dte->pt_root = paddr_to_pfn(root_ptr);
     dte->iw = true;
     dte->ir = true;
     dte->paging_mode = paging_mode;
+    smp_wmb();
     dte->tv = true;
     dte->v = valid;
 }
@@ -130,6 +137,7 @@ void amd_iommu_set_intremap_table(
     }
 
     dte->ig = false; /* unmapped interrupts result in i/o page faults */
+    smp_wmb();
     dte->iv = valid;
 }
 
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -117,7 +117,10 @@ static void amd_iommu_setup_domain_devic
         /* Undo what amd_iommu_disable_domain_device() may have done. */
         ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
         if ( dte->it_root )
+        {
             dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+            smp_wmb();
+        }
         dte->iv = iommu_intremap;
         dte->ex = ivrs_dev->dte_allow_exclusion;
         dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic