'[kwin] [Bug 343551] Kwin hangs, stops drawing the screen and starts using 100% cpu inside nvidia-glc'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-bugs-dist
Subject:    [kwin] [Bug 343551] Kwin hangs, stops drawing the screen and starts using 100% cpu inside nvidia-glc
From:       Fredrik Höglund <fredrik () kde ! org>
Date:       2015-02-17 23:37:28
Message-ID: bug-343551-17878-8qquWPwjqL () http ! bugs ! kde ! org/
[Download RAW message or body]

https://bugs.kde.org/show_bug.cgi?id=343551

--- Comment #34 from Fredrik Höglund <fredrik@kde.org> ---
(In reply to Simeon Bird from comment #33)
> > Your patch is absolutely correct, but some of the comments in the code are
> > not.
> 
> Ok, I'll update the comments and post a new version. Incidentally, is this
> actually an nvidia bug?
> ie, does the standard call for glDeleteSync not to block? If the answer is
> yes, should the patch
> be made conditional on the nvidia driver somehow?

I would say that the OpenGL specification strongly implies that glDeleteSync
should not block, but it doesn't explicitly say that it's not allowed to. My
guess is that there's some limitation that prevents the NVIDIA driver from
knowing when it's safe to delete the sync object without blocking on the fence.
Triggering the fence before deleting it is not a big deal though, so I wouldn't
bother with making it conditional on the NVIDIA driver. It's the only driver
that implements the GL_EXT_x11_sync_object extension anyway.

> > There is no need to call wait() before deleting the fence. The purpose of
> > wait() is to prevent the GPU from executing future draw commands before the
> > fence is signaled, and that's not relevant here. 
> 
> What I was worried about was in some other case - if trigger() is called and
> then insertWait() is called immediately afterwards as part of the normal
> draw routines. This would be a classic race condition and would lead to an
> occasional unrepeatable hang. But maybe it isn't possible for this to happen
> without something equivalent to xcb_flush?

That's a good question. It shouldn't matter if the command buffer that signals
the fence is submitted after the command buffer that waits for it, as long as
both command buffers are able to execute concurrently. This is of course
hardware dependent, but all current NVIDIA GPU's should have multiple hardware
contexts. The best way to test this is probably to call glWaitSync() and
glFlush(), and then tell the X server to trigger the fence. If that results in
a GPU hang, we need to make sure that the X server has processed the trigger
request before we call glWaitSync(). It might be a good idea to do that anyway
for the sake of robustness.

-- 
You are receiving this mail because:
You are watching all bug changes.=
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic