[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xen-devel
Subject:    Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device
From:       "Xu, Quan" <quan.xu () intel ! com>
Date:       2015-09-30 15:05:49
Message-ID: 945CA011AD5F084CBEA3E851C0AB2889402A8083 () SHSMSX101 ! ccr ! corp ! intel ! com
[Download RAW message or body]

> > > > > On September 29, 2015, at 5:12 PM, <tim@xen.org> wrote:
> At 03:08 +0000 on 28 Sep (1443409723), Xu, Quan wrote:
> > > > > Thursday, September 24, 2015 12:27 AM, Tim Deegan wrote:
> > > 7/13: I'm not convinced that making the vcpu spin calling
> > > sched_yield() is a very good plan.  Better to explicitly pause the
> > > domain if you need its vcpus not to run.  But first -- why does
> > > IOMMU flushing mean that vcpus can't be run?
> > 
> > Ensure that the required Device-TLB flushes are applied before
> > returning to guest mode via hypercall completion.  the domain can also
> > DMA this freed pages.  For example, Call do_memory_op HYPERCALL to
> > free a pageX (gfn --- mfn) from domain, and assume that there is a
> > mapping(gfn --- mfn) in Device-TLB, once the vcpu has returned to
> > guest mode, then the domain can still DMA this freed pageX.  Domain
> > kernel cannot use this being freed page, otherwise this is a domain
> > kernel bug.
> 
> 
> OK - let's ignore guest kernel bugs.  IIUC you're worried about the guest OS
> telling a device to issue DMA to an address that has changed in the IOMMU
> tables (unmapped, remapped elsewhere, permisisons changedm &c) but not yet
> been flushed?


Yes, issue DMA to an address that has changed in the IOMMU table and EPT table, but \
not yet been flushed.


> 
> Unfortunately, pausing the guest's CPUs doesn't stop that.  A malicious guest
> could enqueue network receive buffers pointing to that address, and then
> arrange for a packet to arrive between the IOMMU table change and the flush
> completion.

Cool !!

> So you'll need to do something else to make the unmap safe.
> The usual
> method in Xen is to hold a reference to the page (for read-only
> mappings)


Read-only mapping refers to 'PGT_pinned'?
Could I introduce a new typed reference which can only been deref in QI interrupt \
handler(or associated tasklet)?? --(stop me, I always want to add some new flag or \
typed ..) And preventing changes of ownership/type on the relevant pages.


> or a typed reference (for read-write), and not release that reference
> until the flush has completed.  That's OK with in-line synchronous flushes.
> 
> With the flush taking longer than Xen can wait for, you'll need to do something
> more complex, e.g.:
> - keep a log of all relevant pending derefs, to be processed when the
> flush completes; 



One of the CCed mentioned this solution in internal discussions. But it is tricky and \
over-engineering. I need more than half year to implement it.


> or
> - have some other method of preventing changes of ownership/type on
> the relevant pages. 


I prefer this solution.


> E.g. for CPU TLBs, we keep a per-page counter
> (tlbflush-timestamp) that we can use to detect whether enough TLB
> flushes have happened since the page was freed.
> 
> The log is tricky - I'm not sure how toq make sure that it has bounded size if a
> flush can take seconds.
> 
> I'm not sure the counter works either -- when that detector triggers we do a
> synchronous TLB-flush IPI to make the operation safe, and that's exactly what we
> can't do here.
> 
> Any other ideas floating around?
> 
> Cheers,
> 

Tim, thanks for your help.
Any idea, I will send out. Maybe it is not a complete solution. 

Quan

> Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic