[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-aio
Subject:    Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
From:       Stephane Doyon <sdoyon () max-t ! com>
Date:       2006-07-11 13:40:49
Message-ID: Pine.LNX.4.64.0607110928010.3327 () madrid ! max-t ! internal
[Download RAW message or body]

On Tue, 11 Jul 2006, Nathan Scott wrote:

> On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:
>> Hi,
>
> Hi Stephane,
>
>> A few months back, a fix was introduced for a mutex double unlock warning
>> related to direct I/O in XFS. I believe that fix has a lock ordering
>> problem that can cause a deadlock.
>>
>> The problem was that __blockdev_direct_IO() would unlock the i_mutex in
>> the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again
>> in xfs_read() soon after returning from __generic_file_aio_read(). Because
>> there are lots of execution paths down from __generic_file_aio_read() that
>> do not consistently release the i_mutex, the safest fix was deemed to be
>> to reacquire the i_mutex before returning from __blockdev_direct_IO(). At
>> this point however, the reader is holding an xfs ilock, and AFAICT the
>> i_mutex is usually taken BEFORE the xfs ilock.
>
> That is correct, yes.  Hmm.  Subtle.  Painful.  Thanks for the detailed
> report and your analysis.
>
>> We are seeing a deadlock between a process writing and another process
>> reading the same file, when the reader is using direct I/O. (The writer
>
> Is that a direct writer or a buffered writer?

Whichever, both cases trigger the deadlock.

>> must apparently be growing the file, using either direct or buffered
>> I/O.) Something like this, on XFS:
>> (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null
>> if=f iflag=direct bs=128K count=5000
>>
>> Seen on kernels 2.6.16 and 2.6.17.
>>
>> The deadlock scenario appears to be this:
>> -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
>> XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
>> eventually goes down to __blockdev_direct_IO(). In there it drops the
>> i_mutex.
>> -The writer goes into xfs_write() and obtains the i_mutex. It then tries
>> to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by
>> the reader.
>> -The reader, still in __blockdev_direct_IO(), executes directio_worker()
>> and then tries to reacquire the i_mutex, and must wait on it because the
>> writer has it.
>>
>> And so we have a deadlock.
>
> *nod*.  This will require some thought, I'm not sure I like the sound of
> your suggested workaround there a whole lot, unfortunately, but maybe it
> is all we can do at this stage.  Let me ponder further and get back to you

Thank you.

> (but if you want to prototype your fix further, that'd be most welcome, of
> course).

Sure, well it's not very subtle. The below patch is what I'm using for 
now. I haven't seen problems with it yet but it hasn't been seriously 
tested.

I'm assuming that it's OK to keep holding the i_mutex during the call to 
direct_io_worker(), because in the DIO_LOCKING case, direct_io_worker() 
is called with the i_mutex held, and XFS touches i_mutex only in 
xfs_read() and xfs_write(), so opefully nothing will conflict.

Signed-off-by: Stephane Doyon <sdoyon@max-t.com>
--- linux/fs/direct-io.c.orig	2006-07-11 09:23:20.000000000 -0400
+++ linux/fs/direct-io.c	2006-07-11 09:27:54.000000000 -0400
@@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
  	loff_t end = offset;
  	struct dio *dio;
  	int release_i_mutex = 0;
-	int acquire_i_mutex = 0;

  	if (rw & WRITE)
  		current->flags |= PF_SYNCWRITE;
@@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc
  				goto out;
  			}

-			if (dio_lock_type == DIO_OWN_LOCKING) {
-				mutex_unlock(&inode->i_mutex);
-				acquire_i_mutex = 1;
-			}
-		}

  		if (dio_lock_type == DIO_LOCKING)
  			down_read(&inode->i_alloc_sem);
@@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc
  out:
  	if (release_i_mutex)
  		mutex_unlock(&inode->i_mutex);
-	else if (acquire_i_mutex)
-		mutex_lock(&inode->i_mutex);
  	if (rw & WRITE)
  		current->flags &= ~PF_SYNCWRITE;
  	return retval;

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic