[prev in list] [next in list] [prev in thread] [next in thread]
List: xen-devel
Subject: Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device
From: "Xu, Quan" <quan.xu () intel ! com>
Date: 2015-09-30 15:05:49
Message-ID: 945CA011AD5F084CBEA3E851C0AB2889402A8083 () SHSMSX101 ! ccr ! corp ! intel ! com
[Download RAW message or body]
> > > > > On September 29, 2015, at 5:12 PM, <tim@xen.org> wrote:
> At 03:08 +0000 on 28 Sep (1443409723), Xu, Quan wrote:
> > > > > Thursday, September 24, 2015 12:27 AM, Tim Deegan wrote:
> > > 7/13: I'm not convinced that making the vcpu spin calling
> > > sched_yield() is a very good plan. Better to explicitly pause the
> > > domain if you need its vcpus not to run. But first -- why does
> > > IOMMU flushing mean that vcpus can't be run?
> >
> > Ensure that the required Device-TLB flushes are applied before
> > returning to guest mode via hypercall completion. the domain can also
> > DMA this freed pages. For example, Call do_memory_op HYPERCALL to
> > free a pageX (gfn --- mfn) from domain, and assume that there is a
> > mapping(gfn --- mfn) in Device-TLB, once the vcpu has returned to
> > guest mode, then the domain can still DMA this freed pageX. Domain
> > kernel cannot use this being freed page, otherwise this is a domain
> > kernel bug.
>
>
> OK - let's ignore guest kernel bugs. IIUC you're worried about the guest OS
> telling a device to issue DMA to an address that has changed in the IOMMU
> tables (unmapped, remapped elsewhere, permisisons changedm &c) but not yet
> been flushed?
Yes, issue DMA to an address that has changed in the IOMMU table and EPT table, but \
not yet been flushed.
>
> Unfortunately, pausing the guest's CPUs doesn't stop that. A malicious guest
> could enqueue network receive buffers pointing to that address, and then
> arrange for a packet to arrive between the IOMMU table change and the flush
> completion.
Cool !!
> So you'll need to do something else to make the unmap safe.
> The usual
> method in Xen is to hold a reference to the page (for read-only
> mappings)
Read-only mapping refers to 'PGT_pinned'?
Could I introduce a new typed reference which can only been deref in QI interrupt \
handler(or associated tasklet)?? --(stop me, I always want to add some new flag or \
typed ..) And preventing changes of ownership/type on the relevant pages.
> or a typed reference (for read-write), and not release that reference
> until the flush has completed. That's OK with in-line synchronous flushes.
>
> With the flush taking longer than Xen can wait for, you'll need to do something
> more complex, e.g.:
> - keep a log of all relevant pending derefs, to be processed when the
> flush completes;
One of the CCed mentioned this solution in internal discussions. But it is tricky and \
over-engineering. I need more than half year to implement it.
> or
> - have some other method of preventing changes of ownership/type on
> the relevant pages.
I prefer this solution.
> E.g. for CPU TLBs, we keep a per-page counter
> (tlbflush-timestamp) that we can use to detect whether enough TLB
> flushes have happened since the page was freed.
>
> The log is tricky - I'm not sure how toq make sure that it has bounded size if a
> flush can take seconds.
>
> I'm not sure the counter works either -- when that detector triggers we do a
> synchronous TLB-flush IPI to make the operation safe, and that's exactly what we
> can't do here.
>
> Any other ideas floating around?
>
> Cheers,
>
Tim, thanks for your help.
Any idea, I will send out. Maybe it is not a complete solution.
Quan
> Tim.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic