[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-xfs
Subject: Re: [PATCH 1/2] xfs: always drain dio before extending aio write submission
From: Eric Sandeen <sandeen () sandeen ! net>
Date: 2015-09-29 22:25:02
Message-ID: 560B0FBE.5010205 () sandeen ! net
[Download RAW message or body]
Looks good to me, and tested w/ a testcase I'll send in a bit.
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
On 9/28/15 1:38 PM, Brian Foster wrote:
> XFS supports and typically allows concurrent asynchronous direct I/O
> submission to a single file. One exception to the rule is that file
> extending dio writes that start beyond the current EOF (e.g.,
> potentially create a hole at EOF) require exclusive I/O access to the
> file. This is because such writes must zero any pre-existing blocks
> beyond EOF that are exposed by virtue of now residing within EOF as a
> result of the write about to be submitted.
>
> Before EOF zeroing can occur, the current file i_size must be stabilized
> to avoid data corruption. In this scenario, XFS upgrades the iolock to
> exclude any further I/O submission, waits on in-flight I/O to complete
> to ensure i_size is up to date (i_size is updated on dio write
> completion) and restarts the various checks against the state of the
> file. The problem is that this protection sequence is triggered only
> when the iolock is currently held shared. While this is true for async
> dio in most cases, the caller may upgrade the lock in advance based on
> arbitrary circumstances with respect to EOF zeroing. For example, the
> iolock is always acquired exclusively if the start offset is not block
> aligned. This means that even though the iolock is already held
> exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
> can occur based on an unstable i_size.
>
> This problem has been reproduced as guest data corruption in virtual
> machines with file-backed qcow2 virtual disks hosted on an XFS
> filesystem. The virtual disks must be configured with aio=native mode
> and the must not be truncated out to the maximum file size (as some virt
> managers will do).
>
> Update xfs_file_aio_write_checks() to unconditionally drain in-flight
> dio before EOF zeroing can occur. Rather than trigger the wait based on
> iolock state, use a new flag and upgrade the iolock when necessary. Note
> that this results in a full restart of the inode checks even when the
> iolock was already held exclusive when technically it is only required
> to recheck i_size. This should be a rare enough occurrence that it is
> preferable to keep the code simple rather than create an alternate
> restart jump target.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/xfs_file.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e78feb4..347b3e0 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -574,6 +574,7 @@ xfs_file_aio_write_checks(
> struct xfs_inode *ip = XFS_I(inode);
> ssize_t error = 0;
> size_t count = iov_iter_count(from);
> + bool drained_dio = false;
>
> restart:
> error = generic_write_checks(iocb, from);
> @@ -611,12 +612,13 @@ restart:
> bool zero = false;
>
> spin_unlock(&ip->i_flags_lock);
> - if (*iolock == XFS_IOLOCK_SHARED) {
> - xfs_rw_iunlock(ip, *iolock);
> - *iolock = XFS_IOLOCK_EXCL;
> - xfs_rw_ilock(ip, *iolock);
> - iov_iter_reexpand(from, count);
> -
> + if (!drained_dio) {
> + if (*iolock == XFS_IOLOCK_SHARED) {
> + xfs_rw_iunlock(ip, *iolock);
> + *iolock = XFS_IOLOCK_EXCL;
> + xfs_rw_ilock(ip, *iolock);
> + iov_iter_reexpand(from, count);
> + }
> /*
> * We now have an IO submission barrier in place, but
> * AIO can do EOF updates during IO completion and hence
> @@ -626,6 +628,7 @@ restart:
> * no-op.
> */
> inode_dio_wait(inode);
> + drained_dio = true;
> goto restart;
> }
> error = xfs_zero_eof(ip, iocb->ki_pos, i_size_read(inode), &zero);
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic