[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oss-security
Subject:    [oss-security] Xen Security Advisory 226 (CVE-2017-12135) - multiple problems with transitive grants
From:       Xen.org security team <security () xen ! org>
Date:       2017-08-29 12:04:08
Message-ID: E1dmfFY-00029c-RU () xenbits ! xenproject ! org
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2017-12135 / XSA-226
                               version 7

               multiple problems with transitive grants

UPDATES IN VERSION 7
====================

First patch provided in version 6 regressed 32-bit Dom0 or backend
domains. The updated patch includes a fix for this.

ISSUE DESCRIPTION
=================

1) Code to handle copy operations on transitive grants has built in
   retry logic, involving a function reinvoking itself with unchanged
   parameters.  Such use assumes that the compiler would also translate
   this to a so called "tail call" when generating machine code.
   Empirically, this is not commonly the case, allowing for
   theoretically unbounded nesting of such function calls.

2) The reference counting and locking discipline for transitive grants
   is broken.  Concurrent use of the transitive grant can leak
   references on the transitively-referenced grant.

IMPACT
======

A malicious or buggy guest may be able to crash Xen.  Privilege
escalation and information leaks cannot be ruled out.  A malicious or
buggy guest can leak references on grants it has been given, amounting
to a DoS against the grantee.

VULNERABLE SYSTEMS
==================

All versions of Xen are vulnerable.

MITIGATION
==========

There is no known mitigation.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

The security team would also like to thank Amazon for helping to identify that
the problems with transitive grants were deeper than originally believed.

RESOLUTION
==========

Applying the appropriate attached pair of patches from the list below
addresses this issue:

xsa226-unstable/*.patch     xen-unstable
xsa226-4.9/*.patch          Xen 4.9.x, Xen 4.8.x, Xen 4.7.x
xsa226-4.6/*.patch          Xen 4.6.x
xsa226-4.5/*.patch          Xen 4.5.x

Note that these patches have already been applied to the respective staging
trees.

Alternatively, applying the appropriate attached patch from the list
below works around this issue by disabling transitive grants by default:

xsa226.patch           xen-unstable, Xen 4.9.x, Xen 4.8.x
xsa226-4.7.patch       Xen 4.7.x
xsa226-4.6.patch       Xen 4.6.x
xsa226-4.5.patch       Xen 4.5.x

$ sha256sum xsa226* xsa226*/*
b09e07aaf422ae04a4ece5e2c5b5e54036cfae5b5c632bfc6953a0cacd6f60ff  xsa226.patch
d999767014501d3ac62def06ccd43b97bbbf0ef7d402d3bd70ca96ac9997a14d  \
xsa226-unstable/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch \
4473fd96ce4fdea5e19e0b502d65f20bd279d82473ac34ff404ce2b2cbc10be1  \
xsa226-unstable/0002-gnttab-fix-transitive-grant-handling.patch \
ca8b92b2ff58b87e8bec137a34784cbf11e2820659046df6e1d71e23bf7e7dee  xsa226-4.5.patch \
ca77d01172abf263b5b731f26f5e3f74b0b8c75b3e29bee3f65a9318236daba7  \
xsa226-4.5/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch \
de6359e50fd2bb710469da74a596013ce275edb43d3d1c36d41452f88eee9b7d  \
xsa226-4.5/0002-gnttab-fix-transitive-grant-handling.patch \
28c7df7edabb91fb2f1fa3fc7d6906bfae75a6e701f1cd335baafaae3e087696  xsa226-4.6.patch \
0186f78e99f5f6eec913da8355e0c28946a14a6099a7219bd4e0d385fdf8c306  \
xsa226-4.6/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch \
e34dbba7b94942faeb3e6b7630ba06f01998e2b56be1035d76e67aa47e77457d  \
xsa226-4.6/0002-gnttab-fix-transitive-grant-handling.patch \
fffcc0a4428723e6aea391ff4f1d27326b5a3763d2308cbde64e6a786502c702  xsa226-4.7.patch \
3878c27b77ba24012599289e0e0fb1e5198b1e4efe2f87f7c46def5f335f2fd5  \
xsa226-4.9/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch \
01d773c5bb4cafe54daf0d14e8a3af899a7c5863513d18927c4a570a74afdb15  \
xsa226-4.9/0002-gnttab-fix-transitive-grant-handling.patch $

(The .meta file is a prototype machine-readable file for describing
which patches are to be applied how.)

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCAAGBQJZpVgpAAoJEIP+FMlX6CvZ228H/jXq5lHGZwtGmbgFY1O6/LBk
wrExcAq5iSXVHmfXCR1budkAEYxqCptAbO6FNljvfZVu1bMnGq/ONJs6+UUMCcLb
TCLoqqAvSN06dftIcKSCDOW6GpmRs+lEdZYHO6qkEh1hTHY83OjqqQW2jhOGf4iV
IS1kytbERXzjzApeTECcUJ4Fxd2sGD8PUMiD4XFagtJu3mjSl5Y1M57z21WBzSuK
dHwUzt9sKAd/FOHvpT27GxWw69XR2dI0vKrVtY+Wgudmi4cVt4qnLPirhxkulRVL
yVWZeC3dBgjwR1kE2NNuuBXUTHfmyV/kj8s9Jd0z4Z3aGyX/24uZfL1eJq02Sa8=
=oTGH
-----END PGP SIGNATURE-----


["xsa226.patch" (application/octet-stream)]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: grant_table: Default to v1, and disallow transitive grants

The reference counting and locking discipline for transitive grants is broken.
Their use is therefore declared out of security support.

This is XSA-226.

Transitive grants are expected to be unconditionally available with grant
table v2.  Hiding transitive grants alone is an ABI breakage for the guest.
Modern versions of Linux and the Windows PV drivers use grant table v1, but
older versions did use v2.

In principle, disabling gnttab v2 entirely is the safer way to cause guests to
avoid using transitive grants. However, some older guests which defaulted to
using gnttab v2 don't tolerate falling back from v2 to v1 over migrate.

This patch introduces a new command line option to control grant table
behaviour.  One suboption allows a choice of the maximum grant table version
Xen will allow the guest to use, and defaults to v2.  A different suboption
independently controls whether transitive grants can be used.

The default case is:

    gnttab=max_ver:2

To disable gnttab v2 entirely, use:

    gnttab=max_ver:1

To allow gnttab v2 and transitive grants, use:

    gnttab=max_ver:2,transitive

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 4002eab..af079b4 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -868,6 +868,22 @@ Controls EPT related features.
 
 Specify which console gdbstub should use. See **console**.
 
+### gnttab
+> `= List of [ max_ver:<integer>, transitive ]`
+
+> Default: `gnttab=max_ver:2,no-transitive`
+
+Control various aspects of the grant table behaviour available to guests.
+
+* `max_ver` Select the maximum grant table version to offer to guests.  Valid
+version are 1 and 2.
+* `transitive` Permit or disallow the use of transitive grants.  Note that the
+use of grant table v2 without transitive grants is an ABI breakage from the
+guests point of view.
+
+*Warning:*
+Due to XSA-226, the use of transitive grants is outside of security support.
+
 ### gnttab\_max\_frames
 > `= <integer>`
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index ae34547..87131f8 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -50,6 +50,42 @@ integer_param("gnttab_max_nr_frames", max_nr_grant_frames);
 unsigned int __read_mostly max_grant_frames;
 integer_param("gnttab_max_frames", max_grant_frames);
 
+static unsigned int __read_mostly opt_gnttab_max_version = 2;
+static bool __read_mostly opt_transitive_grants;
+
+static void __init parse_gnttab(char *s)
+{
+    char *ss;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strncmp(s, "max_ver:", 8) )
+        {
+            long ver = simple_strtol(s + 8, NULL, 10);
+
+            if ( ver >= 1 && ver <= 2 )
+                opt_gnttab_max_version = ver;
+        }
+        else
+        {
+            bool val = !!strncmp(s, "no-", 3);
+
+            if ( !val )
+                s += 3;
+
+            if ( !strcmp(s, "transitive") )
+                opt_transitive_grants = val;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("gnttab", parse_gnttab);
+
 /* The maximum number of grant mappings is defined as a multiplier of the
  * maximum number of grant table entries. This defines the multiplier used.
  * Pretty arbitrary. [POLICY]
@@ -2191,6 +2227,10 @@ __acquire_grant_for_copy(
         }
         else if ( (shah->flags & GTF_type_mask) == GTF_transitive )
         {
+            if ( !opt_transitive_grants )
+                PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                         "transitive grant disallowed by policy\n");
+
             if ( !allow_transitive )
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant when transitivity not allowed\n");
@@ -3159,7 +3199,10 @@ do_grant_table_op(
     }
     case GNTTABOP_set_version:
     {
-        rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
+        if ( opt_gnttab_max_version == 1 )
+            rc = -ENOSYS; /* Behave as before set_version was introduced. */
+        else
+            rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
         break;
     }
     case GNTTABOP_get_status_frames:

["xsa226-unstable/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: don't use possibly unbounded tail calls

There is no guarantee that the compiler would actually translate them
to branches instead of calls, so only ones with a known recursion limit
are okay:
- __release_grant_for_copy() can call itself only once, as
  __acquire_grant_for_copy() won't permit use of multi-level transitive
  grants,
- __acquire_grant_for_copy() is fine to call itself with the last
  argument false, as that prevents further recursion,
- __acquire_grant_for_copy() must not call itself to recover from an
  observed change to the active entry's pin count

This is part of CVE-2017-12135 / XSA-226.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -258,9 +258,9 @@ int compat_grant_table_op(unsigned int cmd,
                 rc = gnttab_copy(guest_handle_cast(nat.uop, gnttab_copy_t), n);
             if ( rc > 0 )
             {
-                ASSERT(rc < n);
-                i -= n - rc;
-                n = rc;
+                ASSERT(rc <= n);
+                i -= rc;
+                n -= rc;
             }
             if ( rc >= 0 )
             {
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2109,8 +2109,10 @@ __release_grant_for_copy(
 
     if ( td != rd )
     {
-        /* Recursive calls, but they're tail calls, so it's
-           okay. */
+        /*
+         * Recursive calls, but they're bounded (acquire permits only a single
+         * level of transitivity), so it's okay.
+         */
         if ( released_write )
             __release_grant_for_copy(td, trans_gref, 0);
         else if ( released_read )
@@ -2262,10 +2264,11 @@ __acquire_grant_for_copy(
                 return rc;
             }
 
-            /* We dropped the lock, so we have to check that nobody
-               else tried to pin (or, for that matter, unpin) the
-               reference in *this* domain.  If they did, just give up
-               and try again. */
+            /*
+             * We dropped the lock, so we have to check that nobody else tried
+             * to pin (or, for that matter, unpin) the reference in *this*
+             * domain.  If they did, just give up and tell the caller to retry.
+             */
             if ( act->pin != old_pin )
             {
                 __fixup_status_for_copy_pin(act, status);
@@ -2273,9 +2276,8 @@ __acquire_grant_for_copy(
                 active_entry_release(act);
                 grant_read_unlock(rgt);
                 put_page(*page);
-                return __acquire_grant_for_copy(rd, gref, ldom, readonly,
-                                                frame, page, page_off, length,
-                                                allow_transitive);
+                *page = NULL;
+                return ERESTART;
             }
 
             /* The actual remote remote grant may or may not be a
@@ -2574,7 +2576,7 @@ static int gnttab_copy_one(const struct
     {
         gnttab_copy_release_buf(src);
         rc = gnttab_copy_claim_buf(op, &op->source, src, GNTCOPY_source_gref);
-        if ( rc < 0 )
+        if ( rc )
             goto out;
     }
 
@@ -2584,7 +2586,7 @@ static int gnttab_copy_one(const struct
     {
         gnttab_copy_release_buf(dest);
         rc = gnttab_copy_claim_buf(op, &op->dest, dest, GNTCOPY_dest_gref);
-        if ( rc < 0 )
+        if ( rc )
             goto out;
     }
 
@@ -2593,6 +2595,14 @@ static int gnttab_copy_one(const struct
     return rc;
 }
 
+/*
+ * gnttab_copy(), other than the various other helpers of
+ * do_grant_table_op(), returns (besides possible error indicators)
+ * "count - i" rather than "i" to ensure that even if no progress
+ * was made at all (perhaps due to gnttab_copy_one() returning a
+ * positive value) a non-zero value is being handed back (zero needs
+ * to be avoided, as that means "success, all done").
+ */
 static long gnttab_copy(
     XEN_GUEST_HANDLE_PARAM(gnttab_copy_t) uop, unsigned int count)
 {
@@ -2606,7 +2616,7 @@ static long gnttab_copy(
     {
         if ( i && hypercall_preempt_check() )
         {
-            rc = i;
+            rc = count - i;
             break;
         }
 
@@ -2616,13 +2626,20 @@ static long gnttab_copy(
             break;
         }
 
-        op.status = gnttab_copy_one(&op, &dest, &src);
-        if ( op.status != GNTST_okay )
+        rc = gnttab_copy_one(&op, &dest, &src);
+        if ( rc > 0 )
+        {
+            rc = count - i;
+            break;
+        }
+        if ( rc != GNTST_okay )
         {
             gnttab_copy_release_buf(&src);
             gnttab_copy_release_buf(&dest);
         }
 
+        op.status = rc;
+        rc = 0;
         if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
         {
             rc = -EFAULT;
@@ -3162,6 +3179,7 @@ do_grant_table_op(
         rc = gnttab_copy(copy, count);
         if ( rc > 0 )
         {
+            rc = count - rc;
             guest_handle_add_offset(copy, rc);
             uop = guest_handle_cast(copy, void);
         }

["xsa226-unstable/0002-gnttab-fix-transitive-grant-handling.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: fix transitive grant handling

Processing of transitive grants must not use the fast path, or else
reference counting breaks due to the skipped recursive call to
__acquire_grant_for_copy() (its __release_grant_for_copy()
counterpart occurs independent of original pin count). Furthermore
after re-acquiring temporarily dropped locks we need to verify no grant
properties changed if the original pin count was non-zero; checking
just the pin counts is sufficient only for well-behaved guests. As a
result, __release_grant_for_copy() needs to mirror that new behavior.

Furthermore a __release_grant_for_copy() invocation was missing on the
retry path of __acquire_grant_for_copy(), and gnttab_set_version() also
needs to bail out upon encountering a transitive grant.

This is part of CVE-2017-12135 / XSA-226.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2056,13 +2056,8 @@ __release_grant_for_copy(
     unsigned long r_frame;
     uint16_t *status;
     grant_ref_t trans_gref;
-    int released_read;
-    int released_write;
     struct domain *td;
 
-    released_read = 0;
-    released_write = 0;
-
     grant_read_lock(rgt);
 
     act = active_entry_acquire(rgt, gref);
@@ -2092,17 +2087,11 @@ __release_grant_for_copy(
 
         act->pin -= GNTPIN_hstw_inc;
         if ( !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)) )
-        {
-            released_write = 1;
             gnttab_clear_flag(_GTF_writing, status);
-        }
     }
 
     if ( !act->pin )
-    {
         gnttab_clear_flag(_GTF_reading, status);
-        released_read = 1;
-    }
 
     active_entry_release(act);
     grant_read_unlock(rgt);
@@ -2110,13 +2099,10 @@ __release_grant_for_copy(
     if ( td != rd )
     {
         /*
-         * Recursive calls, but they're bounded (acquire permits only a single
+         * Recursive call, but it is bounded (acquire permits only a single
          * level of transitivity), so it's okay.
          */
-        if ( released_write )
-            __release_grant_for_copy(td, trans_gref, 0);
-        else if ( released_read )
-            __release_grant_for_copy(td, trans_gref, 1);
+        __release_grant_for_copy(td, trans_gref, readonly);
 
         rcu_unlock_domain(td);
     }
@@ -2190,8 +2176,108 @@ __acquire_grant_for_copy(
                  act->domid, ldom, act->pin);
 
     old_pin = act->pin;
-    if ( !act->pin ||
-         (!readonly && !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
+    if ( sha2 && (shah->flags & GTF_type_mask) == GTF_transitive )
+    {
+        if ( (!old_pin || (!readonly &&
+                           !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)))) &&
+             (rc = _set_status_v2(ldom, readonly, 0, shah, act,
+                                  status)) != GNTST_okay )
+            goto unlock_out;
+
+        if ( !allow_transitive )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant when transitivity not allowed\n");
+
+        trans_domid = sha2->transitive.trans_domid;
+        trans_gref = sha2->transitive.gref;
+        barrier(); /* Stop the compiler from re-loading
+                      trans_domid from shared memory */
+        if ( trans_domid == rd->domain_id )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grants cannot be self-referential\n");
+
+        /*
+         * We allow the trans_domid == ldom case, which corresponds to a
+         * grant being issued by one domain, sent to another one, and then
+         * transitively granted back to the original domain.  Allowing it
+         * is easy, and means that you don't need to go out of your way to
+         * avoid it in the guest.
+         */
+
+        /* We need to leave the rrd locked during the grant copy. */
+        td = rcu_lock_domain_by_id(trans_domid);
+        if ( td == NULL )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant referenced bad domain %d\n",
+                     trans_domid);
+
+        /*
+         * __acquire_grant_for_copy() could take the lock on the
+         * remote table (if rd == td), so we have to drop the lock
+         * here and reacquire.
+         */
+        active_entry_release(act);
+        grant_read_unlock(rgt);
+
+        rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
+                                      readonly, &grant_frame, page,
+                                      &trans_page_off, &trans_length, 0);
+
+        grant_read_lock(rgt);
+        act = active_entry_acquire(rgt, gref);
+
+        if ( rc != GNTST_okay )
+        {
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            active_entry_release(act);
+            grant_read_unlock(rgt);
+            return rc;
+        }
+
+        /*
+         * We dropped the lock, so we have to check that the grant didn't
+         * change, and that nobody else tried to pin/unpin it. If anything
+         * changed, just give up and tell the caller to retry.
+         */
+        if ( rgt->gt_version != 2 ||
+             act->pin != old_pin ||
+             (old_pin && (act->domid != ldom || act->frame != grant_frame ||
+                          act->start != trans_page_off ||
+                          act->length != trans_length ||
+                          act->trans_domain != td ||
+                          act->trans_gref != trans_gref ||
+                          !act->is_sub_page)) )
+        {
+            __release_grant_for_copy(td, trans_gref, readonly);
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            active_entry_release(act);
+            grant_read_unlock(rgt);
+            put_page(*page);
+            *page = NULL;
+            return ERESTART;
+        }
+
+        if ( !old_pin )
+        {
+            act->domid = ldom;
+            act->start = trans_page_off;
+            act->length = trans_length;
+            act->trans_domain = td;
+            act->trans_gref = trans_gref;
+            act->frame = grant_frame;
+            act->gfn = -1ul;
+            /*
+             * The actual remote remote grant may or may not be a sub-page,
+             * but we always treat it as one because that blocks mappings of
+             * transitive grants.
+             */
+            act->is_sub_page = 1;
+        }
+    }
+    else if ( !old_pin ||
+              (!readonly && !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
     {
         if ( (rc = _set_status(rgt->gt_version, ldom,
                                readonly, 0, shah, act,
@@ -2212,80 +2298,6 @@ __acquire_grant_for_copy(
             trans_page_off = 0;
             trans_length = PAGE_SIZE;
         }
-        else if ( (shah->flags & GTF_type_mask) == GTF_transitive )
-        {
-            if ( !allow_transitive )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant when transitivity not allowed\n");
-
-            trans_domid = sha2->transitive.trans_domid;
-            trans_gref = sha2->transitive.gref;
-            barrier(); /* Stop the compiler from re-loading
-                          trans_domid from shared memory */
-            if ( trans_domid == rd->domain_id )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grants cannot be self-referential\n");
-
-            /* We allow the trans_domid == ldom case, which
-               corresponds to a grant being issued by one domain, sent
-               to another one, and then transitively granted back to
-               the original domain.  Allowing it is easy, and means
-               that you don't need to go out of your way to avoid it
-               in the guest. */
-
-            /* We need to leave the rrd locked during the grant copy */
-            td = rcu_lock_domain_by_id(trans_domid);
-            if ( td == NULL )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant referenced bad domain %d\n",
-                         trans_domid);
-
-            /*
-             * __acquire_grant_for_copy() could take the lock on the
-             * remote table (if rd == td), so we have to drop the lock
-             * here and reacquire
-             */
-            active_entry_release(act);
-            grant_read_unlock(rgt);
-
-            rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
-                                          readonly, &grant_frame, page,
-                                          &trans_page_off, &trans_length, 0);
-
-            grant_read_lock(rgt);
-            act = active_entry_acquire(rgt, gref);
-
-            if ( rc != GNTST_okay )
-            {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                active_entry_release(act);
-                grant_read_unlock(rgt);
-                return rc;
-            }
-
-            /*
-             * We dropped the lock, so we have to check that nobody else tried
-             * to pin (or, for that matter, unpin) the reference in *this*
-             * domain.  If they did, just give up and tell the caller to retry.
-             */
-            if ( act->pin != old_pin )
-            {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                active_entry_release(act);
-                grant_read_unlock(rgt);
-                put_page(*page);
-                *page = NULL;
-                return ERESTART;
-            }
-
-            /* The actual remote remote grant may or may not be a
-               sub-page, but we always treat it as one because that
-               blocks mappings of transitive grants. */
-            is_sub_page = 1;
-            act->gfn = -1ul;
-        }
         else if ( !(sha2->hdr.flags & GTF_sub_page) )
         {
             rc = __get_paged_frame(sha2->full_page.frame, &grant_frame, page, readonly, rd);
@@ -2710,10 +2722,13 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
     case 2:
         for ( i = 0; i < GNTTAB_NR_RESERVED_ENTRIES; i++ )
         {
-            if ( ((shared_entry_v2(gt, i).hdr.flags & GTF_type_mask) ==
-                  GTF_permit_access) &&
-                 (shared_entry_v2(gt, i).full_page.frame >> 32) )
+            switch ( shared_entry_v2(gt, i).hdr.flags & GTF_type_mask )
             {
+            case GTF_permit_access:
+                 if ( !(shared_entry_v2(gt, i).full_page.frame >> 32) )
+                     break;
+                 /* fall through */
+            case GTF_transitive:
                 gdprintk(XENLOG_WARNING,
                          "tried to change grant table version to 1 with non-representable entries\n");
                 res = -ERANGE;

["xsa226-4.5.patch" (application/octet-stream)]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: grant_table: Default to v1, and disallow transitive grants

The reference counting and locking discipline for transitive grants is broken.
Their use is therefore declared out of security support.

This is XSA-226.

Transitive grants are expected to be unconditionally available with grant
table v2.  Hiding transitive grants alone is an ABI breakage for the guest.
Modern versions of Linux and the Windows PV drivers use grant table v1, but
older versions did use v2.

In principle, disabling gnttab v2 entirely is the safer way to cause guests to
avoid using transitive grants. However, some older guests which defaulted to
using gnttab v2 don't tolerate falling back from v2 to v1 over migrate.

This patch introduces a new command line option to control grant table
behaviour.  One suboption allows a choice of the maximum grant table version
Xen will allow the guest to use, and defaults to v2.  A different suboption
independently controls whether transitive grants can be used.

The default case is:

    gnttab=max_ver:2

To disable gnttab v2 entirely, use:

    gnttab=max_ver:1

To allow gnttab v2 and transitive grants, use:

    gnttab=max_ver:2,transitive

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 16bfb39..3936316 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -662,6 +662,22 @@ does not provide VM\_ENTRY\_LOAD\_GUEST\_PAT.
 
 Specify the serial parameters for the GDB stub.
 
+### gnttab
+> `= List of [ max_ver:<integer>, transitive ]`
+
+> Default: `gnttab=max_ver:2,no-transitive`
+
+Control various aspects of the grant table behaviour available to guests.
+
+* `max_ver` Select the maximum grant table version to offer to guests.  Valid
+version are 1 and 2.
+* `transitive` Permit or disallow the use of transitive grants.  Note that the
+use of grant table v2 without transitive grants is an ABI breakage from the
+guests point of view.
+
+*Warning:*
+Due to XSA-226, the use of transitive grants is outside of security support.
+
 ### gnttab\_max\_frames
 > `= <integer>`
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 83a4b9e..c9a6cd9 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -50,6 +50,42 @@ integer_param("gnttab_max_nr_frames", max_nr_grant_frames);
 unsigned int __read_mostly max_grant_frames;
 integer_param("gnttab_max_frames", max_grant_frames);
 
+static unsigned int __read_mostly opt_gnttab_max_version = 2;
+static bool_t __read_mostly opt_transitive_grants;
+
+static void __init parse_gnttab(char *s)
+{
+    char *ss;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strncmp(s, "max_ver:", 8) )
+        {
+            long ver = simple_strtol(s + 8, NULL, 10);
+
+            if ( ver >= 1 && ver <= 2 )
+                opt_gnttab_max_version = ver;
+        }
+        else
+        {
+            bool_t val = !!strncmp(s, "no-", 3);
+
+            if ( !val )
+                s += 3;
+
+            if ( !strcmp(s, "transitive") )
+                opt_transitive_grants = val;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("gnttab", parse_gnttab);
+
 /* The maximum number of grant mappings is defined as a multiplier of the
  * maximum number of grant table entries. This defines the multiplier used.
  * Pretty arbitrary. [POLICY]
@@ -1958,6 +1994,10 @@ __acquire_grant_for_copy(
         trans_gref = gref;
         if ( sha2 && (shah->flags & GTF_type_mask) == GTF_transitive )
         {
+            if ( !opt_transitive_grants )
+                PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                         "transitive grant disallowed by policy\n");
+
             if ( !allow_transitive )
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant when transitivity not allowed\n");
@@ -2741,7 +2781,10 @@ do_grant_table_op(
     }
     case GNTTABOP_set_version:
     {
-        rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
+        if ( opt_gnttab_max_version == 1 )
+            rc = -ENOSYS; /* Behave as before set_version was introduced. */
+        else
+            rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
         break;
     }
     case GNTTABOP_get_status_frames:

["xsa226-4.5/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: don't use possibly unbounded tail calls

There is no guarantee that the compiler would actually translate them
to branches instead of calls, so only ones with a known recursion limit
are okay:
- __release_grant_for_copy() can call itself only once, as
  __acquire_grant_for_copy() won't permit use of multi-level transitive
  grants,
- __acquire_grant_for_copy() is fine to call itself with the last
  argument false, as that prevents further recursion,
- __acquire_grant_for_copy() must not call itself to recover from an
  observed change to the active entry's pin count

This is part of CVE-2017-12135 / XSA-226.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -258,9 +258,9 @@ int compat_grant_table_op(unsigned int cmd,
                 rc = gnttab_copy(guest_handle_cast(nat.uop, gnttab_copy_t), n);
             if ( rc > 0 )
             {
-                ASSERT(rc < n);
-                i -= n - rc;
-                n = rc;
+                ASSERT(rc <= n);
+                i -= rc;
+                n -= rc;
             }
             if ( rc >= 0 )
             {
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1860,8 +1860,10 @@ __release_grant_for_copy(
 
     if ( td != rd )
     {
-        /* Recursive calls, but they're tail calls, so it's
-           okay. */
+        /*
+         * Recursive calls, but they're bounded (acquire permits only a single
+         * level of transitivity), so it's okay.
+         */
         if ( released_write )
             __release_grant_for_copy(td, trans_gref, 0);
         else if ( released_read )
@@ -1997,19 +1999,19 @@ __acquire_grant_for_copy(
                 return rc;
             }
 
-            /* We dropped the lock, so we have to check that nobody
-               else tried to pin (or, for that matter, unpin) the
-               reference in *this* domain.  If they did, just give up
-               and try again. */
+            /*
+             * We dropped the lock, so we have to check that nobody else tried
+             * to pin (or, for that matter, unpin) the reference in *this*
+             * domain.  If they did, just give up and tell the caller to retry.
+             */
             if ( act->pin != old_pin )
             {
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
                 spin_unlock(&rgt->lock);
                 put_page(*page);
-                return __acquire_grant_for_copy(rd, gref, ldom, readonly,
-                                                frame, page, page_off, length,
-                                                allow_transitive);
+                *page = NULL;
+                return ERESTART;
             }
 
             /* The actual remote remote grant may or may not be a
@@ -2089,7 +2091,7 @@ __acquire_grant_for_copy(
     return rc;
 }
 
-static void
+static bool_t
 __gnttab_copy(
     struct gnttab_copy *op)
 {
@@ -2213,9 +2215,20 @@ __gnttab_copy(
         rcu_unlock_domain(sd);
     if ( dd )
         rcu_unlock_domain(dd);
+    if ( rc > 0 )
+        return 0;
     op->status = rc;
+    return 1;
 }
 
+/*
+ * gnttab_copy(), other than the various other helpers of
+ * do_grant_table_op(), returns (besides possible error indicators)
+ * "count - i" rather than "i" to ensure that even if no progress
+ * was made at all (perhaps due to gnttab_copy_one() returning a
+ * positive value) a non-zero value is being handed back (zero needs
+ * to be avoided, as that means "success, all done").
+ */
 static long
 gnttab_copy(
     XEN_GUEST_HANDLE_PARAM(gnttab_copy_t) uop, unsigned int count)
@@ -2226,10 +2239,11 @@ gnttab_copy(
     for ( i = 0; i < count; i++ )
     {
         if (i && hypercall_preempt_check())
-            return i;
+            return count - i;
         if ( unlikely(__copy_from_guest(&op, uop, 1)) )
             return -EFAULT;
-        __gnttab_copy(&op);
+        if ( !__gnttab_copy(&op) )
+            return count - i;
         if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
             return -EFAULT;
         guest_handle_add_offset(uop, 1);
@@ -2727,6 +2741,7 @@ do_grant_table_op(
         rc = gnttab_copy(copy, count);
         if ( rc > 0 )
         {
+            rc = count - rc;
             guest_handle_add_offset(copy, rc);
             uop = guest_handle_cast(copy, void);
         }

["xsa226-4.5/0002-gnttab-fix-transitive-grant-handling.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: fix transitive grant handling

Processing of transitive grants must not use the fast path, or else
reference counting breaks due to the skipped recursive call to
__acquire_grant_for_copy() (its __release_grant_for_copy()
counterpart occurs independent of original pin count). Furthermore
after re-acquiring temporarily dropped locks we need to verify no grant
properties changed if the original pin count was non-zero; checking
just the pin counts is sufficient only for well-behaved guests. As a
result, __release_grant_for_copy() needs to mirror that new behavior.

Furthermore a __release_grant_for_copy() invocation was missing on the
retry path of __acquire_grant_for_copy(), and gnttab_set_version() also
needs to bail out upon encountering a transitive grant.

This is part of CVE-2017-12135 / XSA-226.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1808,13 +1808,8 @@ __release_grant_for_copy(
     unsigned long r_frame;
     uint16_t *status;
     grant_ref_t trans_gref;
-    int released_read;
-    int released_write;
     struct domain *td;
 
-    released_read = 0;
-    released_write = 0;
-
     spin_lock(&rgt->lock);
 
     act = &active_entry(rgt, gref);
@@ -1844,30 +1839,21 @@ __release_grant_for_copy(
 
         act->pin -= GNTPIN_hstw_inc;
         if ( !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)) )
-        {
-            released_write = 1;
             gnttab_clear_flag(_GTF_writing, status);
-        }
     }
 
     if ( !act->pin )
-    {
         gnttab_clear_flag(_GTF_reading, status);
-        released_read = 1;
-    }
 
     spin_unlock(&rgt->lock);
 
     if ( td != rd )
     {
         /*
-         * Recursive calls, but they're bounded (acquire permits only a single
+         * Recursive call, but it is bounded (acquire permits only a single
          * level of transitivity), so it's okay.
          */
-        if ( released_write )
-            __release_grant_for_copy(td, trans_gref, 0);
-        else if ( released_read )
-            __release_grant_for_copy(td, trans_gref, 1);
+        __release_grant_for_copy(td, trans_gref, readonly);
 
         rcu_unlock_domain(td);
     }
@@ -1948,79 +1934,113 @@ __acquire_grant_for_copy(
                  act->domid, ldom, act->pin);
 
     old_pin = act->pin;
-    if ( !act->pin ||
-         (!readonly && !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
+    if ( sha2 && (shah->flags & GTF_type_mask) == GTF_transitive )
     {
-        if ( (rc = _set_status(rgt->gt_version, ldom,
-                               readonly, 0, shah, act,
-                               status) ) != GNTST_okay )
-             goto unlock_out;
+        if ( (!old_pin || (!readonly &&
+                           !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)))) &&
+             (rc = _set_status_v2(ldom, readonly, 0, shah, act,
+                                  status)) != GNTST_okay )
+            goto unlock_out;
+
+        if ( !allow_transitive )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant when transitivity not allowed\n");
+
+        trans_domid = sha2->transitive.trans_domid;
+        trans_gref = sha2->transitive.gref;
+        barrier(); /* Stop the compiler from re-loading
+                      trans_domid from shared memory */
+        if ( trans_domid == rd->domain_id )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grants cannot be self-referential\n");
 
-        td = rd;
-        trans_gref = gref;
-        if ( sha2 && (shah->flags & GTF_type_mask) == GTF_transitive )
+        /*
+         * We allow the trans_domid == ldom case, which corresponds to a
+         * grant being issued by one domain, sent to another one, and then
+         * transitively granted back to the original domain.  Allowing it
+         * is easy, and means that you don't need to go out of your way to
+         * avoid it in the guest.
+         */
+
+        /* We need to leave the rrd locked during the grant copy. */
+        td = rcu_lock_domain_by_id(trans_domid);
+        if ( td == NULL )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant referenced bad domain %d\n",
+                     trans_domid);
+
+        /*
+         * __acquire_grant_for_copy() could take the lock on the
+         * remote table (if rd == td), so we have to drop the lock
+         * here and reacquire.
+         */
+        spin_unlock(&rgt->lock);
+
+        rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
+                                      readonly, &grant_frame, page,
+                                      &trans_page_off, &trans_length, 0);
+
+        spin_lock(&rgt->lock);
+
+        if ( rc != GNTST_okay )
         {
-            if ( !allow_transitive )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant when transitivity not allowed\n");
-
-            trans_domid = sha2->transitive.trans_domid;
-            trans_gref = sha2->transitive.gref;
-            barrier(); /* Stop the compiler from re-loading
-                          trans_domid from shared memory */
-            if ( trans_domid == rd->domain_id )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grants cannot be self-referential\n");
-
-            /* We allow the trans_domid == ldom case, which
-               corresponds to a grant being issued by one domain, sent
-               to another one, and then transitively granted back to
-               the original domain.  Allowing it is easy, and means
-               that you don't need to go out of your way to avoid it
-               in the guest. */
-
-            /* We need to leave the rrd locked during the grant copy */
-            td = rcu_lock_domain_by_id(trans_domid);
-            if ( td == NULL )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant referenced bad domain %d\n",
-                         trans_domid);
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
             spin_unlock(&rgt->lock);
+            return rc;
+        }
 
-            rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
-                                          readonly, &grant_frame, page,
-                                          &trans_page_off, &trans_length, 0);
-
-            spin_lock(&rgt->lock);
-            if ( rc != GNTST_okay ) {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                spin_unlock(&rgt->lock);
-                return rc;
-            }
+        /*
+         * We dropped the lock, so we have to check that the grant didn't
+         * change, and that nobody else tried to pin/unpin it. If anything
+         * changed, just give up and tell the caller to retry.
+         */
+        if ( rgt->gt_version != 2 ||
+             act->pin != old_pin ||
+             (old_pin && (act->domid != ldom || act->frame != grant_frame ||
+                          act->start != trans_page_off ||
+                          act->length != trans_length ||
+                          act->trans_domain != td ||
+                          act->trans_gref != trans_gref ||
+                          !act->is_sub_page)) )
+        {
+            __release_grant_for_copy(td, trans_gref, readonly);
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            spin_unlock(&rgt->lock);
+            put_page(*page);
+            *page = NULL;
+            return ERESTART;
+        }
 
+        if ( !old_pin )
+        {
+            act->domid = ldom;
+            act->start = trans_page_off;
+            act->length = trans_length;
+            act->trans_domain = td;
+            act->trans_gref = trans_gref;
+            act->frame = grant_frame;
+            act->gfn = -1ul;
             /*
-             * We dropped the lock, so we have to check that nobody else tried
-             * to pin (or, for that matter, unpin) the reference in *this*
-             * domain.  If they did, just give up and tell the caller to retry.
+             * The actual remote remote grant may or may not be a sub-page,
+             * but we always treat it as one because that blocks mappings of
+             * transitive grants.
              */
-            if ( act->pin != old_pin )
-            {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                spin_unlock(&rgt->lock);
-                put_page(*page);
-                *page = NULL;
-                return ERESTART;
-            }
-
-            /* The actual remote remote grant may or may not be a
-               sub-page, but we always treat it as one because that
-               blocks mappings of transitive grants. */
-            is_sub_page = 1;
-            act->gfn = -1ul;
+            act->is_sub_page = 1;
         }
-        else if ( sha1 )
+    }
+    else if ( !old_pin ||
+              (!readonly && !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
+    {
+        if ( (rc = _set_status(rgt->gt_version, ldom,
+                               readonly, 0, shah, act,
+                               status) ) != GNTST_okay )
+             goto unlock_out;
+
+        td = rd;
+        trans_gref = gref;
+        if ( sha1 )
         {
             rc = __get_paged_frame(sha1->frame, &grant_frame, page, readonly, rd);
             if ( rc != GNTST_okay )
@@ -2295,14 +2315,35 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
         }
     }
 
-    /* XXX: If we're going to version 2, we could maybe shrink the
-       active grant table here. */
-
-    if ( op.version == 2 && gt->gt_version < 2 )
+    switch ( gt->gt_version )
     {
-        res = gnttab_populate_status_frames(d, gt, nr_grant_frames(gt));
-        if ( res < 0)
-            goto out_unlock;
+    case 0:
+        if ( op.version == 2 )
+        {
+    case 1:
+            /* XXX: We could maybe shrink the active grant table here. */
+            res = gnttab_populate_status_frames(d, gt, nr_grant_frames(gt));
+            if ( res < 0)
+                goto out_unlock;
+        }
+        break;
+    case 2:
+        for ( i = 0; i < GNTTAB_NR_RESERVED_ENTRIES; i++ )
+        {
+            switch ( shared_entry_v2(gt, i).hdr.flags & GTF_type_mask )
+            {
+            case GTF_permit_access:
+                 if ( !(shared_entry_v2(gt, i).full_page.frame >> 32) )
+                     break;
+                 /* fall through */
+            case GTF_transitive:
+                gdprintk(XENLOG_WARNING,
+                         "tried to change grant table version to 1 with non-representable entries\n");
+                res = -ERANGE;
+                goto out_unlock;
+            }
+        }
+        break;
     }
 
     /* Preserve the first 8 entries (toolstack reserved grants) */

["xsa226-4.6.patch" (application/octet-stream)]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: grant_table: Default to v1, and disallow transitive grants

The reference counting and locking discipline for transitive grants is broken.
Their use is therefore declared out of security support.

This is XSA-226.

Transitive grants are expected to be unconditionally available with grant
table v2.  Hiding transitive grants alone is an ABI breakage for the guest.
Modern versions of Linux and the Windows PV drivers use grant table v1, but
older versions did use v2.

In principle, disabling gnttab v2 entirely is the safer way to cause guests to
avoid using transitive grants. However, some older guests which defaulted to
using gnttab v2 don't tolerate falling back from v2 to v1 over migrate.

This patch introduces a new command line option to control grant table
behaviour.  One suboption allows a choice of the maximum grant table version
Xen will allow the guest to use, and defaults to v2.  A different suboption
independently controls whether transitive grants can be used.

The default case is:

    gnttab=max_ver:2

To disable gnttab v2 entirely, use:

    gnttab=max_ver:1

To allow gnttab v2 and transitive grants, use:

    gnttab=max_ver:2,transitive

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index d99a20a..113bb29 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -733,6 +733,22 @@ Controls EPT related features.
 
 Specify the serial parameters for the GDB stub.
 
+### gnttab
+> `= List of [ max_ver:<integer>, transitive ]`
+
+> Default: `gnttab=max_ver:2,no-transitive`
+
+Control various aspects of the grant table behaviour available to guests.
+
+* `max_ver` Select the maximum grant table version to offer to guests.  Valid
+version are 1 and 2.
+* `transitive` Permit or disallow the use of transitive grants.  Note that the
+use of grant table v2 without transitive grants is an ABI breakage from the
+guests point of view.
+
+*Warning:*
+Due to XSA-226, the use of transitive grants is outside of security support.
+
 ### gnttab\_max\_frames
 > `= <integer>`
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 20230fb..98845c4 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -50,6 +50,42 @@ integer_param("gnttab_max_nr_frames", max_nr_grant_frames);
 unsigned int __read_mostly max_grant_frames;
 integer_param("gnttab_max_frames", max_grant_frames);
 
+static unsigned int __read_mostly opt_gnttab_max_version = 2;
+static bool_t __read_mostly opt_transitive_grants;
+
+static void __init parse_gnttab(char *s)
+{
+    char *ss;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strncmp(s, "max_ver:", 8) )
+        {
+            long ver = simple_strtol(s + 8, NULL, 10);
+
+            if ( ver >= 1 && ver <= 2 )
+                opt_gnttab_max_version = ver;
+        }
+        else
+        {
+            bool_t val = !!strncmp(s, "no-", 3);
+
+            if ( !val )
+                s += 3;
+
+            if ( !strcmp(s, "transitive") )
+                opt_transitive_grants = val;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("gnttab", parse_gnttab);
+
 /* The maximum number of grant mappings is defined as a multiplier of the
  * maximum number of grant table entries. This defines the multiplier used.
  * Pretty arbitrary. [POLICY]
@@ -2175,6 +2211,10 @@ __acquire_grant_for_copy(
         }
         else if ( (shah->flags & GTF_type_mask) == GTF_transitive )
         {
+            if ( !opt_transitive_grants )
+                PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                         "transitive grant disallowed by policy\n");
+
             if ( !allow_transitive )
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant when transitivity not allowed\n");
@@ -3143,7 +3183,10 @@ do_grant_table_op(
     }
     case GNTTABOP_set_version:
     {
-        rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
+        if ( opt_gnttab_max_version == 1 )
+            rc = -ENOSYS; /* Behave as before set_version was introduced. */
+        else
+            rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
         break;
     }
     case GNTTABOP_get_status_frames:

["xsa226-4.6/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: don't use possibly unbounded tail calls

There is no guarantee that the compiler would actually translate them
to branches instead of calls, so only ones with a known recursion limit
are okay:
- __release_grant_for_copy() can call itself only once, as
  __acquire_grant_for_copy() won't permit use of multi-level transitive
  grants,
- __acquire_grant_for_copy() is fine to call itself with the last
  argument false, as that prevents further recursion,
- __acquire_grant_for_copy() must not call itself to recover from an
  observed change to the active entry's pin count

This is part of CVE-2017-12135 / XSA-226.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -258,9 +258,9 @@ int compat_grant_table_op(unsigned int cmd,
                 rc = gnttab_copy(guest_handle_cast(nat.uop, gnttab_copy_t), n);
             if ( rc > 0 )
             {
-                ASSERT(rc < n);
-                i -= n - rc;
-                n = rc;
+                ASSERT(rc <= n);
+                i -= rc;
+                n -= rc;
             }
             if ( rc >= 0 )
             {
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2089,8 +2089,10 @@ __release_grant_for_copy(
 
     if ( td != rd )
     {
-        /* Recursive calls, but they're tail calls, so it's
-           okay. */
+        /*
+         * Recursive calls, but they're bounded (acquire permits only a single
+         * level of transitivity), so it's okay.
+         */
         if ( released_write )
             __release_grant_for_copy(td, trans_gref, 0);
         else if ( released_read )
@@ -2241,10 +2243,11 @@ __acquire_grant_for_copy(
                 return rc;
             }
 
-            /* We dropped the lock, so we have to check that nobody
-               else tried to pin (or, for that matter, unpin) the
-               reference in *this* domain.  If they did, just give up
-               and try again. */
+            /*
+             * We dropped the lock, so we have to check that nobody else tried
+             * to pin (or, for that matter, unpin) the reference in *this*
+             * domain.  If they did, just give up and tell the caller to retry.
+             */
             if ( act->pin != old_pin )
             {
                 __fixup_status_for_copy_pin(act, status);
@@ -2252,9 +2255,8 @@ __acquire_grant_for_copy(
                 active_entry_release(act);
                 read_unlock(&rgt->lock);
                 put_page(*page);
-                return __acquire_grant_for_copy(rd, gref, ldom, readonly,
-                                                frame, page, page_off, length,
-                                                allow_transitive);
+                *page = NULL;
+                return ERESTART;
             }
 
             /* The actual remote remote grant may or may not be a
@@ -2560,7 +2562,7 @@ static int gnttab_copy_one(const struct
     {
         gnttab_copy_release_buf(src);
         rc = gnttab_copy_claim_buf(op, &op->source, src, GNTCOPY_source_gref);
-        if ( rc < 0 )
+        if ( rc )
             goto out;
     }
 
@@ -2570,7 +2572,7 @@ static int gnttab_copy_one(const struct
     {
         gnttab_copy_release_buf(dest);
         rc = gnttab_copy_claim_buf(op, &op->dest, dest, GNTCOPY_dest_gref);
-        if ( rc < 0 )
+        if ( rc )
             goto out;
     }
 
@@ -2579,6 +2581,14 @@ static int gnttab_copy_one(const struct
     return rc;
 }
 
+/*
+ * gnttab_copy(), other than the various other helpers of
+ * do_grant_table_op(), returns (besides possible error indicators)
+ * "count - i" rather than "i" to ensure that even if no progress
+ * was made at all (perhaps due to gnttab_copy_one() returning a
+ * positive value) a non-zero value is being handed back (zero needs
+ * to be avoided, as that means "success, all done").
+ */
 static long gnttab_copy(
     XEN_GUEST_HANDLE_PARAM(gnttab_copy_t) uop, unsigned int count)
 {
@@ -2592,7 +2602,7 @@ static long gnttab_copy(
     {
         if ( i && hypercall_preempt_check() )
         {
-            rc = i;
+            rc = count - i;
             break;
         }
 
@@ -2602,13 +2612,20 @@ static long gnttab_copy(
             break;
         }
 
-        op.status = gnttab_copy_one(&op, &dest, &src);
-        if ( op.status != GNTST_okay )
+        rc = gnttab_copy_one(&op, &dest, &src);
+        if ( rc > 0 )
+        {
+            rc = count - i;
+            break;
+        }
+        if ( rc != GNTST_okay )
         {
             gnttab_copy_release_buf(&src);
             gnttab_copy_release_buf(&dest);
         }
 
+        op.status = rc;
+        rc = 0;
         if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
         {
             rc = -EFAULT;
@@ -3146,6 +3163,7 @@ do_grant_table_op(
         rc = gnttab_copy(copy, count);
         if ( rc > 0 )
         {
+            rc = count - rc;
             guest_handle_add_offset(copy, rc);
             uop = guest_handle_cast(copy, void);
         }

["xsa226-4.6/0002-gnttab-fix-transitive-grant-handling.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: fix transitive grant handling

Processing of transitive grants must not use the fast path, or else
reference counting breaks due to the skipped recursive call to
__acquire_grant_for_copy() (its __release_grant_for_copy()
counterpart occurs independent of original pin count). Furthermore
after re-acquiring temporarily dropped locks we need to verify no grant
properties changed if the original pin count was non-zero; checking
just the pin counts is sufficient only for well-behaved guests. As a
result, __release_grant_for_copy() needs to mirror that new behavior.

Furthermore a __release_grant_for_copy() invocation was missing on the
retry path of __acquire_grant_for_copy(), and gnttab_set_version() also
needs to bail out upon encountering a transitive grant.

This is part of CVE-2017-12135 / XSA-226.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2036,13 +2036,8 @@ __release_grant_for_copy(
     unsigned long r_frame;
     uint16_t *status;
     grant_ref_t trans_gref;
-    int released_read;
-    int released_write;
     struct domain *td;
 
-    released_read = 0;
-    released_write = 0;
-
     read_lock(&rgt->lock);
 
     act = active_entry_acquire(rgt, gref);
@@ -2072,17 +2067,11 @@ __release_grant_for_copy(
 
         act->pin -= GNTPIN_hstw_inc;
         if ( !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)) )
-        {
-            released_write = 1;
             gnttab_clear_flag(_GTF_writing, status);
-        }
     }
 
     if ( !act->pin )
-    {
         gnttab_clear_flag(_GTF_reading, status);
-        released_read = 1;
-    }
 
     active_entry_release(act);
     read_unlock(&rgt->lock);
@@ -2090,13 +2079,10 @@ __release_grant_for_copy(
     if ( td != rd )
     {
         /*
-         * Recursive calls, but they're bounded (acquire permits only a single
+         * Recursive call, but it is bounded (acquire permits only a single
          * level of transitivity), so it's okay.
          */
-        if ( released_write )
-            __release_grant_for_copy(td, trans_gref, 0);
-        else if ( released_read )
-            __release_grant_for_copy(td, trans_gref, 1);
+        __release_grant_for_copy(td, trans_gref, readonly);
 
         rcu_unlock_domain(td);
     }
@@ -2170,8 +2156,108 @@ __acquire_grant_for_copy(
                  act->domid, ldom, act->pin);
 
     old_pin = act->pin;
-    if ( !act->pin ||
-         (!readonly && !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
+    if ( sha2 && (shah->flags & GTF_type_mask) == GTF_transitive )
+    {
+        if ( (!old_pin || (!readonly &&
+                           !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)))) &&
+             (rc = _set_status_v2(ldom, readonly, 0, shah, act,
+                                  status)) != GNTST_okay )
+            goto unlock_out;
+
+        if ( !allow_transitive )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant when transitivity not allowed\n");
+
+        trans_domid = sha2->transitive.trans_domid;
+        trans_gref = sha2->transitive.gref;
+        barrier(); /* Stop the compiler from re-loading
+                      trans_domid from shared memory */
+        if ( trans_domid == rd->domain_id )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grants cannot be self-referential\n");
+
+        /*
+         * We allow the trans_domid == ldom case, which corresponds to a
+         * grant being issued by one domain, sent to another one, and then
+         * transitively granted back to the original domain.  Allowing it
+         * is easy, and means that you don't need to go out of your way to
+         * avoid it in the guest.
+         */
+
+        /* We need to leave the rrd locked during the grant copy. */
+        td = rcu_lock_domain_by_id(trans_domid);
+        if ( td == NULL )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant referenced bad domain %d\n",
+                     trans_domid);
+
+        /*
+         * __acquire_grant_for_copy() could take the lock on the
+         * remote table (if rd == td), so we have to drop the lock
+         * here and reacquire.
+         */
+        active_entry_release(act);
+        read_unlock(&rgt->lock);
+
+        rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
+                                      readonly, &grant_frame, page,
+                                      &trans_page_off, &trans_length, 0);
+
+        read_lock(&rgt->lock);
+        act = active_entry_acquire(rgt, gref);
+
+        if ( rc != GNTST_okay )
+        {
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            active_entry_release(act);
+            read_unlock(&rgt->lock);
+            return rc;
+        }
+
+        /*
+         * We dropped the lock, so we have to check that the grant didn't
+         * change, and that nobody else tried to pin/unpin it. If anything
+         * changed, just give up and tell the caller to retry.
+         */
+        if ( rgt->gt_version != 2 ||
+             act->pin != old_pin ||
+             (old_pin && (act->domid != ldom || act->frame != grant_frame ||
+                          act->start != trans_page_off ||
+                          act->length != trans_length ||
+                          act->trans_domain != td ||
+                          act->trans_gref != trans_gref ||
+                          !act->is_sub_page)) )
+        {
+            __release_grant_for_copy(td, trans_gref, readonly);
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            active_entry_release(act);
+            read_unlock(&rgt->lock);
+            put_page(*page);
+            *page = NULL;
+            return ERESTART;
+        }
+
+        if ( !old_pin )
+        {
+            act->domid = ldom;
+            act->start = trans_page_off;
+            act->length = trans_length;
+            act->trans_domain = td;
+            act->trans_gref = trans_gref;
+            act->frame = grant_frame;
+            act->gfn = -1ul;
+            /*
+             * The actual remote remote grant may or may not be a sub-page,
+             * but we always treat it as one because that blocks mappings of
+             * transitive grants.
+             */
+            act->is_sub_page = 1;
+        }
+    }
+    else if ( !old_pin ||
+              (!readonly && !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
     {
         if ( (rc = _set_status(rgt->gt_version, ldom,
                                readonly, 0, shah, act,
@@ -2192,79 +2278,6 @@ __acquire_grant_for_copy(
             trans_page_off = 0;
             trans_length = PAGE_SIZE;
         }
-        else if ( (shah->flags & GTF_type_mask) == GTF_transitive )
-        {
-            if ( !allow_transitive )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant when transitivity not allowed\n");
-
-            trans_domid = sha2->transitive.trans_domid;
-            trans_gref = sha2->transitive.gref;
-            barrier(); /* Stop the compiler from re-loading
-                          trans_domid from shared memory */
-            if ( trans_domid == rd->domain_id )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grants cannot be self-referential\n");
-
-            /* We allow the trans_domid == ldom case, which
-               corresponds to a grant being issued by one domain, sent
-               to another one, and then transitively granted back to
-               the original domain.  Allowing it is easy, and means
-               that you don't need to go out of your way to avoid it
-               in the guest. */
-
-            /* We need to leave the rrd locked during the grant copy */
-            td = rcu_lock_domain_by_id(trans_domid);
-            if ( td == NULL )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant referenced bad domain %d\n",
-                         trans_domid);
-
-            /*
-             * __acquire_grant_for_copy() could take the lock on the
-             * remote table (if rd == td), so we have to drop the lock
-             * here and reacquire
-             */
-            active_entry_release(act);
-            read_unlock(&rgt->lock);
-
-            rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
-                                          readonly, &grant_frame, page,
-                                          &trans_page_off, &trans_length, 0);
-
-            read_lock(&rgt->lock);
-            act = active_entry_acquire(rgt, gref);
-
-            if ( rc != GNTST_okay ) {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                active_entry_release(act);
-                read_unlock(&rgt->lock);
-                return rc;
-            }
-
-            /*
-             * We dropped the lock, so we have to check that nobody else tried
-             * to pin (or, for that matter, unpin) the reference in *this*
-             * domain.  If they did, just give up and tell the caller to retry.
-             */
-            if ( act->pin != old_pin )
-            {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                active_entry_release(act);
-                read_unlock(&rgt->lock);
-                put_page(*page);
-                *page = NULL;
-                return ERESTART;
-            }
-
-            /* The actual remote remote grant may or may not be a
-               sub-page, but we always treat it as one because that
-               blocks mappings of transitive grants. */
-            is_sub_page = 1;
-            act->gfn = -1ul;
-        }
         else if ( !(sha2->hdr.flags & GTF_sub_page) )
         {
             rc = __get_paged_frame(sha2->full_page.frame, &grant_frame, page, readonly, rd);
@@ -2696,10 +2709,13 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
     case 2:
         for ( i = 0; i < GNTTAB_NR_RESERVED_ENTRIES; i++ )
         {
-            if ( ((shared_entry_v2(gt, i).hdr.flags & GTF_type_mask) ==
-                  GTF_permit_access) &&
-                 (shared_entry_v2(gt, i).full_page.frame >> 32) )
+            switch ( shared_entry_v2(gt, i).hdr.flags & GTF_type_mask )
             {
+            case GTF_permit_access:
+                 if ( !(shared_entry_v2(gt, i).full_page.frame >> 32) )
+                     break;
+                 /* fall through */
+            case GTF_transitive:
                 gdprintk(XENLOG_WARNING,
                          "tried to change grant table version to 1 with non-representable entries\n");
                 res = -ERANGE;

["xsa226-4.7.patch" (application/octet-stream)]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: grant_table: Default to v1, and disallow transitive grants

The reference counting and locking discipline for transitive grants is broken.
Their use is therefore declared out of security support.

This is XSA-226.

Transitive grants are expected to be unconditionally available with grant
table v2.  Hiding transitive grants alone is an ABI breakage for the guest.
Modern versions of Linux and the Windows PV drivers use grant table v1, but
older versions did use v2.

In principle, disabling gnttab v2 entirely is the safer way to cause guests to
avoid using transitive grants. However, some older guests which defaulted to
using gnttab v2 don't tolerate falling back from v2 to v1 over migrate.

This patch introduces a new command line option to control grant table
behaviour.  One suboption allows a choice of the maximum grant table version
Xen will allow the guest to use, and defaults to v2.  A different suboption
independently controls whether transitive grants can be used.

The default case is:

    gnttab=max_ver:2

To disable gnttab v2 entirely, use:

    gnttab=max_ver:1

To allow gnttab v2 and transitive grants, use:

    gnttab=max_ver:2,transitive

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 73f5265..b792abf 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -758,6 +758,22 @@ Controls EPT related features.
 
 Specify which console gdbstub should use. See **console**.
 
+### gnttab
+> `= List of [ max_ver:<integer>, transitive ]`
+
+> Default: `gnttab=max_ver:2,no-transitive`
+
+Control various aspects of the grant table behaviour available to guests.
+
+* `max_ver` Select the maximum grant table version to offer to guests.  Valid
+version are 1 and 2.
+* `transitive` Permit or disallow the use of transitive grants.  Note that the
+use of grant table v2 without transitive grants is an ABI breakage from the
+guests point of view.
+
+*Warning:*
+Due to XSA-226, the use of transitive grants is outside of security support.
+
 ### gnttab\_max\_frames
 > `= <integer>`
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index f06b664..109c552 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -50,6 +50,42 @@ integer_param("gnttab_max_nr_frames", max_nr_grant_frames);
 unsigned int __read_mostly max_grant_frames;
 integer_param("gnttab_max_frames", max_grant_frames);
 
+static unsigned int __read_mostly opt_gnttab_max_version = 2;
+static bool_t __read_mostly opt_transitive_grants;
+
+static void __init parse_gnttab(char *s)
+{
+    char *ss;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strncmp(s, "max_ver:", 8) )
+        {
+            long ver = simple_strtol(s + 8, NULL, 10);
+
+            if ( ver >= 1 && ver <= 2 )
+                opt_gnttab_max_version = ver;
+        }
+        else
+        {
+            bool_t val = !!strncmp(s, "no-", 3);
+
+            if ( !val )
+                s += 3;
+
+            if ( !strcmp(s, "transitive") )
+                opt_transitive_grants = val;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("gnttab", parse_gnttab);
+
 /* The maximum number of grant mappings is defined as a multiplier of the
  * maximum number of grant table entries. This defines the multiplier used.
  * Pretty arbitrary. [POLICY]
@@ -2188,6 +2224,10 @@ __acquire_grant_for_copy(
         }
         else if ( (shah->flags & GTF_type_mask) == GTF_transitive )
         {
+            if ( !opt_transitive_grants )
+                PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                         "transitive grant disallowed by policy\n");
+
             if ( !allow_transitive )
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant when transitivity not allowed\n");
@@ -3156,7 +3196,10 @@ do_grant_table_op(
     }
     case GNTTABOP_set_version:
     {
-        rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
+        if ( opt_gnttab_max_version == 1 )
+            rc = -ENOSYS; /* Behave as before set_version was introduced. */
+        else
+            rc = gnttab_set_version(guest_handle_cast(uop, gnttab_set_version_t));
         break;
     }
     case GNTTABOP_get_status_frames:

["xsa226-4.9/0001-gnttab-dont-use-possibly-unbounded-tail-calls.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: don't use possibly unbounded tail calls

There is no guarantee that the compiler would actually translate them
to branches instead of calls, so only ones with a known recursion limit
are okay:
- __release_grant_for_copy() can call itself only once, as
  __acquire_grant_for_copy() won't permit use of multi-level transitive
  grants,
- __acquire_grant_for_copy() is fine to call itself with the last
  argument false, as that prevents further recursion,
- __acquire_grant_for_copy() must not call itself to recover from an
  observed change to the active entry's pin count

This is part of CVE-2017-12135 / XSA-226.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -258,9 +258,9 @@ int compat_grant_table_op(unsigned int cmd,
                 rc = gnttab_copy(guest_handle_cast(nat.uop, gnttab_copy_t), n);
             if ( rc > 0 )
             {
-                ASSERT(rc < n);
-                i -= n - rc;
-                n = rc;
+                ASSERT(rc <= n);
+                i -= rc;
+                n -= rc;
             }
             if ( rc >= 0 )
             {
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2103,8 +2103,10 @@ __release_grant_for_copy(
 
     if ( td != rd )
     {
-        /* Recursive calls, but they're tail calls, so it's
-           okay. */
+        /*
+         * Recursive calls, but they're bounded (acquire permits only a single
+         * level of transitivity), so it's okay.
+         */
         if ( released_write )
             __release_grant_for_copy(td, trans_gref, 0);
         else if ( released_read )
@@ -2255,10 +2257,11 @@ __acquire_grant_for_copy(
                 return rc;
             }
 
-            /* We dropped the lock, so we have to check that nobody
-               else tried to pin (or, for that matter, unpin) the
-               reference in *this* domain.  If they did, just give up
-               and try again. */
+            /*
+             * We dropped the lock, so we have to check that nobody else tried
+             * to pin (or, for that matter, unpin) the reference in *this*
+             * domain.  If they did, just give up and tell the caller to retry.
+             */
             if ( act->pin != old_pin )
             {
                 __fixup_status_for_copy_pin(act, status);
@@ -2266,9 +2269,8 @@ __acquire_grant_for_copy(
                 active_entry_release(act);
                 grant_read_unlock(rgt);
                 put_page(*page);
-                return __acquire_grant_for_copy(rd, gref, ldom, readonly,
-                                                frame, page, page_off, length,
-                                                allow_transitive);
+                *page = NULL;
+                return ERESTART;
             }
 
             /* The actual remote remote grant may or may not be a
@@ -2574,7 +2576,7 @@ static int gnttab_copy_one(const struct
     {
         gnttab_copy_release_buf(src);
         rc = gnttab_copy_claim_buf(op, &op->source, src, GNTCOPY_source_gref);
-        if ( rc < 0 )
+        if ( rc )
             goto out;
     }
 
@@ -2584,7 +2586,7 @@ static int gnttab_copy_one(const struct
     {
         gnttab_copy_release_buf(dest);
         rc = gnttab_copy_claim_buf(op, &op->dest, dest, GNTCOPY_dest_gref);
-        if ( rc < 0 )
+        if ( rc )
             goto out;
     }
 
@@ -2593,6 +2595,14 @@ static int gnttab_copy_one(const struct
     return rc;
 }
 
+/*
+ * gnttab_copy(), other than the various other helpers of
+ * do_grant_table_op(), returns (besides possible error indicators)
+ * "count - i" rather than "i" to ensure that even if no progress
+ * was made at all (perhaps due to gnttab_copy_one() returning a
+ * positive value) a non-zero value is being handed back (zero needs
+ * to be avoided, as that means "success, all done").
+ */
 static long gnttab_copy(
     XEN_GUEST_HANDLE_PARAM(gnttab_copy_t) uop, unsigned int count)
 {
@@ -2606,7 +2616,7 @@ static long gnttab_copy(
     {
         if ( i && hypercall_preempt_check() )
         {
-            rc = i;
+            rc = count - i;
             break;
         }
 
@@ -2616,13 +2626,20 @@ static long gnttab_copy(
             break;
         }
 
-        op.status = gnttab_copy_one(&op, &dest, &src);
-        if ( op.status != GNTST_okay )
+        rc = gnttab_copy_one(&op, &dest, &src);
+        if ( rc > 0 )
+        {
+            rc = count - i;
+            break;
+        }
+        if ( rc != GNTST_okay )
         {
             gnttab_copy_release_buf(&src);
             gnttab_copy_release_buf(&dest);
         }
 
+        op.status = rc;
+        rc = 0;
         if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
         {
             rc = -EFAULT;
@@ -3160,6 +3177,7 @@ do_grant_table_op(
         rc = gnttab_copy(copy, count);
         if ( rc > 0 )
         {
+            rc = count - rc;
             guest_handle_add_offset(copy, rc);
             uop = guest_handle_cast(copy, void);
         }

["xsa226-4.9/0002-gnttab-fix-transitive-grant-handling.patch" (application/octet-stream)]

From: Jan Beulich <jbeulich@suse.com>
Subject: gnttab: fix transitive grant handling

Processing of transitive grants must not use the fast path, or else
reference counting breaks due to the skipped recursive call to
__acquire_grant_for_copy() (its __release_grant_for_copy()
counterpart occurs independent of original pin count). Furthermore
after re-acquiring temporarily dropped locks we need to verify no grant
properties changed if the original pin count was non-zero; checking
just the pin counts is sufficient only for well-behaved guests. As a
result, __release_grant_for_copy() needs to mirror that new behavior.

Furthermore a __release_grant_for_copy() invocation was missing on the
retry path of __acquire_grant_for_copy(), and gnttab_set_version() also
needs to bail out upon encountering a transitive grant.

This is part of CVE-2017-12135 / XSA-226.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2050,13 +2050,8 @@ __release_grant_for_copy(
     unsigned long r_frame;
     uint16_t *status;
     grant_ref_t trans_gref;
-    int released_read;
-    int released_write;
     struct domain *td;
 
-    released_read = 0;
-    released_write = 0;
-
     grant_read_lock(rgt);
 
     act = active_entry_acquire(rgt, gref);
@@ -2086,17 +2081,11 @@ __release_grant_for_copy(
 
         act->pin -= GNTPIN_hstw_inc;
         if ( !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)) )
-        {
-            released_write = 1;
             gnttab_clear_flag(_GTF_writing, status);
-        }
     }
 
     if ( !act->pin )
-    {
         gnttab_clear_flag(_GTF_reading, status);
-        released_read = 1;
-    }
 
     active_entry_release(act);
     grant_read_unlock(rgt);
@@ -2104,13 +2093,10 @@ __release_grant_for_copy(
     if ( td != rd )
     {
         /*
-         * Recursive calls, but they're bounded (acquire permits only a single
+         * Recursive call, but it is bounded (acquire permits only a single
          * level of transitivity), so it's okay.
          */
-        if ( released_write )
-            __release_grant_for_copy(td, trans_gref, 0);
-        else if ( released_read )
-            __release_grant_for_copy(td, trans_gref, 1);
+        __release_grant_for_copy(td, trans_gref, readonly);
 
         rcu_unlock_domain(td);
     }
@@ -2184,8 +2170,108 @@ __acquire_grant_for_copy(
                  act->domid, ldom, act->pin);
 
     old_pin = act->pin;
-    if ( !act->pin ||
-         (!readonly && !(act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
+    if ( sha2 && (shah->flags & GTF_type_mask) == GTF_transitive )
+    {
+        if ( (!old_pin || (!readonly &&
+                           !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)))) &&
+             (rc = _set_status_v2(ldom, readonly, 0, shah, act,
+                                  status)) != GNTST_okay )
+            goto unlock_out;
+
+        if ( !allow_transitive )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant when transitivity not allowed\n");
+
+        trans_domid = sha2->transitive.trans_domid;
+        trans_gref = sha2->transitive.gref;
+        barrier(); /* Stop the compiler from re-loading
+                      trans_domid from shared memory */
+        if ( trans_domid == rd->domain_id )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grants cannot be self-referential\n");
+
+        /*
+         * We allow the trans_domid == ldom case, which corresponds to a
+         * grant being issued by one domain, sent to another one, and then
+         * transitively granted back to the original domain.  Allowing it
+         * is easy, and means that you don't need to go out of your way to
+         * avoid it in the guest.
+         */
+
+        /* We need to leave the rrd locked during the grant copy. */
+        td = rcu_lock_domain_by_id(trans_domid);
+        if ( td == NULL )
+            PIN_FAIL(unlock_out_clear, GNTST_general_error,
+                     "transitive grant referenced bad domain %d\n",
+                     trans_domid);
+
+        /*
+         * __acquire_grant_for_copy() could take the lock on the
+         * remote table (if rd == td), so we have to drop the lock
+         * here and reacquire.
+         */
+        active_entry_release(act);
+        grant_read_unlock(rgt);
+
+        rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
+                                      readonly, &grant_frame, page,
+                                      &trans_page_off, &trans_length, 0);
+
+        grant_read_lock(rgt);
+        act = active_entry_acquire(rgt, gref);
+
+        if ( rc != GNTST_okay )
+        {
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            active_entry_release(act);
+            grant_read_unlock(rgt);
+            return rc;
+        }
+
+        /*
+         * We dropped the lock, so we have to check that the grant didn't
+         * change, and that nobody else tried to pin/unpin it. If anything
+         * changed, just give up and tell the caller to retry.
+         */
+        if ( rgt->gt_version != 2 ||
+             act->pin != old_pin ||
+             (old_pin && (act->domid != ldom || act->frame != grant_frame ||
+                          act->start != trans_page_off ||
+                          act->length != trans_length ||
+                          act->trans_domain != td ||
+                          act->trans_gref != trans_gref ||
+                          !act->is_sub_page)) )
+        {
+            __release_grant_for_copy(td, trans_gref, readonly);
+            __fixup_status_for_copy_pin(act, status);
+            rcu_unlock_domain(td);
+            active_entry_release(act);
+            grant_read_unlock(rgt);
+            put_page(*page);
+            *page = NULL;
+            return ERESTART;
+        }
+
+        if ( !old_pin )
+        {
+            act->domid = ldom;
+            act->start = trans_page_off;
+            act->length = trans_length;
+            act->trans_domain = td;
+            act->trans_gref = trans_gref;
+            act->frame = grant_frame;
+            act->gfn = -1ul;
+            /*
+             * The actual remote remote grant may or may not be a sub-page,
+             * but we always treat it as one because that blocks mappings of
+             * transitive grants.
+             */
+            act->is_sub_page = 1;
+        }
+    }
+    else if ( !old_pin ||
+              (!readonly && !(old_pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask))) )
     {
         if ( (rc = _set_status(rgt->gt_version, ldom,
                                readonly, 0, shah, act,
@@ -2206,79 +2292,6 @@ __acquire_grant_for_copy(
             trans_page_off = 0;
             trans_length = PAGE_SIZE;
         }
-        else if ( (shah->flags & GTF_type_mask) == GTF_transitive )
-        {
-            if ( !allow_transitive )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant when transitivity not allowed\n");
-
-            trans_domid = sha2->transitive.trans_domid;
-            trans_gref = sha2->transitive.gref;
-            barrier(); /* Stop the compiler from re-loading
-                          trans_domid from shared memory */
-            if ( trans_domid == rd->domain_id )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grants cannot be self-referential\n");
-
-            /* We allow the trans_domid == ldom case, which
-               corresponds to a grant being issued by one domain, sent
-               to another one, and then transitively granted back to
-               the original domain.  Allowing it is easy, and means
-               that you don't need to go out of your way to avoid it
-               in the guest. */
-
-            /* We need to leave the rrd locked during the grant copy */
-            td = rcu_lock_domain_by_id(trans_domid);
-            if ( td == NULL )
-                PIN_FAIL(unlock_out_clear, GNTST_general_error,
-                         "transitive grant referenced bad domain %d\n",
-                         trans_domid);
-
-            /*
-             * __acquire_grant_for_copy() could take the lock on the
-             * remote table (if rd == td), so we have to drop the lock
-             * here and reacquire
-             */
-            active_entry_release(act);
-            grant_read_unlock(rgt);
-
-            rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
-                                          readonly, &grant_frame, page,
-                                          &trans_page_off, &trans_length, 0);
-
-            grant_read_lock(rgt);
-            act = active_entry_acquire(rgt, gref);
-
-            if ( rc != GNTST_okay ) {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                active_entry_release(act);
-                grant_read_unlock(rgt);
-                return rc;
-            }
-
-            /*
-             * We dropped the lock, so we have to check that nobody else tried
-             * to pin (or, for that matter, unpin) the reference in *this*
-             * domain.  If they did, just give up and tell the caller to retry.
-             */
-            if ( act->pin != old_pin )
-            {
-                __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
-                active_entry_release(act);
-                grant_read_unlock(rgt);
-                put_page(*page);
-                *page = NULL;
-                return ERESTART;
-            }
-
-            /* The actual remote remote grant may or may not be a
-               sub-page, but we always treat it as one because that
-               blocks mappings of transitive grants. */
-            is_sub_page = 1;
-            act->gfn = -1ul;
-        }
         else if ( !(sha2->hdr.flags & GTF_sub_page) )
         {
             rc = __get_paged_frame(sha2->full_page.frame, &grant_frame, page, readonly, rd);
@@ -2710,10 +2723,13 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA
     case 2:
         for ( i = 0; i < GNTTAB_NR_RESERVED_ENTRIES; i++ )
         {
-            if ( ((shared_entry_v2(gt, i).hdr.flags & GTF_type_mask) ==
-                  GTF_permit_access) &&
-                 (shared_entry_v2(gt, i).full_page.frame >> 32) )
+            switch ( shared_entry_v2(gt, i).hdr.flags & GTF_type_mask )
             {
+            case GTF_permit_access:
+                 if ( !(shared_entry_v2(gt, i).full_page.frame >> 32) )
+                     break;
+                 /* fall through */
+            case GTF_transitive:
                 gdprintk(XENLOG_WARNING,
                          "tried to change grant table version to 1 with non-representable entries\n");
                 res = -ERANGE;


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic