[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY
From:       Török_Edwin <edwintorok () gmail ! com>
Date:       2008-11-30 19:54:56
Message-ID: 4932EF90.9070601 () gmail ! com
[Download RAW message or body]

On 2008-11-29 01:02, Mike Waychison wrote:
> Nick Piggin wrote:
>> On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:
>>> Nick Piggin wrote:
>>>> On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:
>>>>> Török however identified mmap taking on the order of several
>>>>> milliseconds due to this exact problem:
>>>>>
>>>>> http://lkml.org/lkml/2008/9/12/185
>>>> Turns out to be a different problem.
>>>>
>>> What do you mean?
>>
>> His is just contending on the write side. The retry patch doesn't help.
>>
>
> I disagree.  How do you get 'write contention' from the following
> paragraph:
>
> "Just to confirm that the problem is with pagefaults and mmap, I dropped
> the mmap_sem in filemap_fault, and then
> I got same performance in my testprogram for mmap and read. Of course
> this is totally unsafe, because the mapping could change at any time."
>
> It reads to me that the writers were held off by the readers sleeping
> in IO.

It is true that I have a write/write contention too, but do_page_fault
shows up too on lock_stat.

This is my guess at what happens:
* filemap_fault used to sleep with mmap_sem held while waiting for the
page lock.
* the google patch avoids that, which is fine: if page lock can't be
taken, it drops mmap_sem, waits, then retries the fault once
* however after we acquired the page lock, mapping->a_ops->readpage is
invoked, mmap_sem is NOT dropped here:

    error = mapping->a_ops->readpage(file, page);
    if (!error) {
        wait_on_page_locked(page);

If my understanding is correct ->readpage does the actual disk I/O, and
it keeps the page locked, when the lock is released we know it has finished.
So wait_on_page_locked(page)  holds mmap_sem locked for read during the
disk I/O, preventing sys_mmap/sys_munmap from making progress.

I don't know how to prove/disprove my guess above, suggestions welcome.

Could the patch be changed to also release the mmap_sem after readpage,
and before wait_on_page_locked?

Best regards,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic