[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-rdma
Subject:    Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best driver supported page size in an MR
From:       Jason Gunthorpe <jgg () ziepe ! ca>
Date:       2019-04-30 21:00:11
Message-ID: 20190430210011.GA9059 () ziepe ! ca
[Download RAW message or body]

On Tue, Apr 30, 2019 at 07:54:24PM +0000, Saleem, Shiraz wrote:
> >Subject: Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best driver
> >supported page size in an MR
> >
> >On Tue, Apr 30, 2019 at 01:36:14PM +0000, Saleem, Shiraz wrote:
> >
> >> >If we make a MR with VA 0x6400FFF and length 2M-4095 I expect the 2M
> >> >page size with the NIC.
> >> >
> >> >We will have dma_addr_start = 0x2600000 and sg_dma_len = 0x200000 as
> >> >the SGL is always rounded to pages
> >>
> >> why isn't it the sg_dma_len 2M-4095? Is it because we compute npages
> >> in ib_umem_get() to build the SGL? Could using (addr & PAGE_MASK) and
> >> then adding dma_len help take care of this?
> >
> >We always round to page sizes when building the SGL, so the start is rounded
> >down and the end remains the same.
> >
> >> >I have a feeling the uvirt_addr should be computed more like
> >> >
> >> >  if (flags & IB_UMEM_VA_BASED_OFFSET)
> >> >        mask |= uvirt_addr ^ umem->addr;
> >>
> >> I am not following.
> >>
> >> For a case like below where uvirt_addr = umem_addr, mask = 0 and we
> >> run rdma_find_pgbit on it we ll pick 4K instead of 2M which is wrong.
> >
> >> uvirt_addr [0x7f2b98200000]
> >> best_pgsz [0x201000]
> >> umem->address [0x7f2b98200000]
> >> dma_addr_start [0x130e200000]
> >> dma_addr_end [0x130e400000]
> >> sg_dma_len [2097152]
> >
> >??
> >
> >0x7f2b98200000 ^ 0x7f2b98200000 = 0
> >
> >So mask isn't altered by the requested VA and you get 2M pages.

> I am still missing the point. mask was initialized to 0 and if we just do
> "mask |= uvirt_addr ^ umem->addr" for the first SGE, then you ll
> always get 0 for mask (and one page size) when uvirt_addr = umem->addr
> irrespective of their values.

This is because mask shouldn't start as zero - the highest possible
mask is something like log2_ru(umem length)

ie absent other considerations the page size at the NIC should be the
size of the umem.

Then we scan the sgl and reduce that value based on the physical
address layout

Then we reduce it again based on the uvirt vs address difference

Oring a '0' into the mask means that step contributes no restriction.

..

So I think the algorithm is just off as is, it should be more like

 // Page size can't be larger than the length of the MR
 mask = log2_ru(umem length); 

 // offset into the first SGL for umem->addr
 pgoff = umem->address & PAGE_MASK;
 va = uvirt_addr;

 for_each_sgl()
   mask |= (sgl->dma_addr + pgoff) ^ va
   if (not first or last)
       // Interior SGLs limit by the length as well
       mask |= sgl->length;
   va += sgl->length - pgoff;
   pgoff = 0;

The key is the xor of the VA to the PA at every step because that is
really the restriction we are looking at. If any VA/PA bits differ for
any address in the MR that immediately reduces the max page size.

Remember what the HW does is
  PA = MR_PAGE[VA] | (VA & PAGE_MASK)

So any situation where PA & PAGE_MASK != VA is a violation - this is
the calculation the XOR is doing.

Jason
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic