[prev in list] [next in list] [prev in thread] [next in thread]
List: dri-devel
Subject: TTM API / functionality fixes (Was Re: Xorg 7.4 release plan)
From: Thomas Hellström <thomas () tungstengraphics ! com>
Date: 2008-02-29 10:08:53
Message-ID: 47C7D9B5.2050804 () tungstengraphics ! com
[Download RAW message or body]
Eric Anholt wrote:
> On Thu, 2008-02-28 at 10:08 +0100, Thomas Hellström wrote:
>
>> Eric Anholt wrote:
>>
>>> On Thu, 2008-02-28 at 06:08 +1000, Dave Airlie wrote:
>>>
>>>
>>>>> I wasn't planning on a Mesa 7.1 (trunk code) release for a while, but I
>>>>> could finish up 7.0.3 at any moment. I have to admit that I haven't
>>>>> actually tested Mesa 7.0.3 with current X code in quite a while though.
>>>>>
>>>>> Before Mesa 7.1 I'd like to see a new, official DRM release. Otherwise,
>>>>> it's hard to identify a snapshot of DRM that works with Mesa. I know I
>>>>> always have trouble with DRM versioning otherwise.
>>>>>
>>>>> Is there any kind of roadmap for a new DRM release?
>>>>>
>>>>>
>>>> When TTM hits the kernel, I'll release a libdrm to work with that and
>>>> solidify the API,
>>>>
>>>> however people keep finding apparently valid reasons to pick holes in
>>>> the TTM API, however I haven't seen the discussion brought up in the
>>>> few weeks since.
>>>>
>>>>
>>> http://cgit.freedesktop.org/~anholt/drm/log/?h=drm-ttm-cleanup-2
>>>
>>> has some I believe obvious cleanups to the API, removing many sharp
>>> edges. At that point the BO parts of the API are more or less tolerable
>>> to me. The fencing code I don't understand and am very scared by still,
>>> but most of it has left the user <-> kernel API at least.
>>>
>>>
>> Some important comments about the API changes, starting from below.
>> Remove DRM_BO_FLAG_FORCE_MAPPABLE, Yes that can go away.
>>
>> Remove DRM_BO_HINT_WAIT_LAZY. No. This flag is intended for polling only
>> hardware, and has no use at all in the intel driver once the sync
>> flushes are gone. The fact that you ever saw a difference with this flag
>> is that there was a bug in the execbuf code that caused you to hit a
>> polling path in the fence wait mechanism.
>>
>> Ignore DRM_FENCE_FLAG_WAIT_LAZY. NO. Same as above.
>>
>
> OK. We should clarify this in the ioctl descriptions so that people
> with sane hardware know that the flags are ignored.
>
Indeed. The lack of documentation is disturbing and should be fixed asap.
>
>> Remove unused DRM_FENCE_FLAG_WAIT_IGNORE_SIGNALS. Yes that's OK.
>>
>> Remove DRM_FENCE_FLAG_NO_USER No. Used by the Poulsbo X server EXA
>> implementation and is quite valuable for small composite operations.
>>
>> Remove DRM_BO_FLAG_CACHED_MAPPED and make that a default behaviour.
>> No!!! We can't do that!!!
>> DRM_BO_FLAG_CACHED_MAPPED is creating an invalid physical page aliasing,
>> the details of which are thoroughly explained here
>>
>
> I may have said it wrong: Make DRM_BO_FLAG_CACHED_MAPPED the default
> behavior if the platform can support it. The point is that it should
> not be userland interface -- if the kernel can manage it, then just do
> it. Otherwise, don't. I'd rather see us disable the performance hack
> for now than leave a go-faster switch in the interface.
>
> Going back over the commit, I didn't make the better behavior
> conditional on the platform being able to do it. Oops, I need to fix
> that.
>
>
Yes, hmm, as I see it there are three performance problems that
DRM_BO_FLAG_CACHED_MAPPED attempts to address:
1) The buffer creation latency due to global_flush_tlb(). This can be
worked around with buffer /page caching in a number of ways (below) and
once the wbinvd() is gone from the main kernel it won't be such a huge
problem anymore.
a) kernel pool of uncached / unmapped (highmem-like) pages. (Not likely
to occur anytime soon)
b) A pre-bound region of VRAM-like AGP memory for batch-buffers and
friends. Easy to set up ands avoids flushing issues altogether.
c) User-space bo-caching and reuse.
d) User-space buffer pools.
TG is heading down the d) path since it also fixes the texture
granularity problem.
2) Relocation application. KeithPs presumed_offset stuff has to a great
extent fixed this problem. I think the kmap_atomic_prot_pfn() stuff just
added will take care of the rest, and I hope the mm kernel guys will
understand the problem and accept the kmap_atomic_prot_pfn() in. I'm
working on a patch that will do post-validation only relocations this way.
3) Streaming reads from GPU to CPU. Use cache-coherent buffers if
available, otherwise SGDMA. I'm not sure (due to prefetching) that
DRM_BO_FLAG_CACHED_MAPPED addresses this issue correctly.
So from my perspective I'd like to keep the default behavior,
particularly as we're using d) to address problem 1), and if I
understand it correctly, Intel is heading down c).
In the long run I'd like to see DRM_BO_FLAG_CACHED_MAPPED disappear, and
us fix whatever's in the way for you to implement c). If we need to
address this before a kernel inclusion, is there a way we can have that
as a driver-specific flag? That would mean adding a driver-specific flag
preprocessing callback.
>> http://marc.info/?l=linux-kernel&m=102376926732464&w=2
>>
>> And this resulted in the change_page_attr() and the dreaded
>> global_flush_tlb() kernel calls. From what I understand it might be OK
>> for streaming writes to the GPU (like batch-buffers) but how would you
>> stop a CPU from prefetching invalid data from a buffer while you're
>> writing to it from the GPU? And even write it back, overwriting what the
>> GPU just wrote?
>> This would break anything trying to use TTM in a consistent way.
>>
>
> As far as we know, Intel CPUs are not affected by the AMD limitation
> that read-only speculation may result in later writeback, so what we do
> works out. It does look like we're not flushing CPU cache at map time
> (bo_map_ioctl -> buffer_object_map -> bo_wait, bo_evict_cached ->
> bo_evict -> move_mem), which is wrong.
>
> Note that in the current implementation, when we map the buffer again,
> we unmap it out of the hardware. It would also be nice to not unmap it
> from the hardware and leave the GART mapping as-is, and just flush the
> cache again when validating. The 3D driver basically never hits this
> path at the moment, but the X server certainly would (sadly), and we may
> have the 3D driver doing this if we do userland buffer reuse.
>
Yes, leaving the GART mapping as-is should probably work fine.
My concern is a case similar to where you're doing rendering and then
needs to do a software fallback.
You'll map the destination buffer but have no way of knowing whether the
CPU has already speculatively prefetched invalid data into the cached
kernel mapping. I guess, in that case, it'll be propagated into the
user-space mapping as well?
/Thomas
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> --
> _______________________________________________
> Dri-devel mailing list
> Dri-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic