[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-ufs-discuss
Subject:    Re: [ufs-discuss] Locking in directio codepath
From:       Frank Batschulat <Frank.Batschulat () Sun ! COM>
Date:       2006-08-23 12:51:11
Message-ID: op.teqlzlrz176b0c () blade
[Download RAW message or body]

On Thu, 17 Aug 2006 20:13:35 +0200, Samarjeet Tomar <sst117@gmail.com>  
wrote:

> ufs_write() seems to be getting all the usual locks in the
> directio codepath too, which was introduced specifically
> for databases (Oracle?). My understanding was that the
> database expects the file system to not do any (or minimal)
> synchronization, as it itself takes care of the synchronization
> for access to database blocks. Is my understanding correct?

not entirely correct, you're mixing up 2 things here:
Direct I/O and Concurrent Direct I/O

What is UFS Direct I/O?

      Direct I/O is a UFS option to advise that  file  system
      reads  and writes should bypass the Solaris file system
      page cache, and go straigh to the disk  device.  Direct
      I/O  uses a very similar code path to that when the raw
      disk path is used, but rather than bypassing  the  file
      system  completely,  we  simply  bypass the file system
      cache. This has the advantage of retaining the  regular
      file system administration model, with the advantage of
      providing an efficient code path similar to that of raw
      disk.

What is Concurrent Direct I/O?

      Concurrent Direct I/O is an enhanced implementation  of
      Direct  I/O available in Solaris 8 Update 3 onwards. It
      provides significant performance improvements for data-
      bases, over the standard Direct I/O implementation.

      Concurrent  Direct  I/O  allows  multiple   overlapping
      reads/writes  to  the  same  file. Without it, only one
      synchronous read or write can occur to a  file  at  any
      time.

However there are some quirks concerning Concurrent Direct I/O.

The approach taken modifies the current directio data path.
No new interfaces were introduced and no existing interfaces
altered. All POSIX semantics have been preserved. To take advantage of
these changes, the file must be marked directio using the directio()
library call or the "forcedirectio" mount option on the entire file system.
In particular, this is a write-only, advisory interface. The application
cannot reasonable determine if a directio (or concurrent directio)  
operation
has occurred. Also, certain optimizations could not be done while  
maintaining
strict POSIX compliance.

However, it is possible to allow concurrent writers
to a single file in restricted situations, the so called "re-write"
situation, it is possible to allow concurrent writers to pre-allocated ufs
files using an unbuffered data path.
The current ufs directio datapath was altered to special case "re-write"
operations. that's what ufs_write() checks for via ufs_check_rewrite()
before calling ufs_directio_write()

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/ufs/ufs_vnops.c#392

which has the following comment:

     402 	/*
     403 	 * Filter to determine if this request is suitable as a
     404 	 * concurrent rewrite. This write must not allocate blocks
     405 	 * by extending the file or filling in holes. No use trying
     406 	 * through FSYNC descriptors as the inode will be synchronously
     407 	 * updated after the write. The uio structure has not yet been
     408 	 * checked for sanity, so assume nothing.
     409 	 */

if those criteria are met the inode's i_rwlock is being held as RW_READER
allowing concurrent write access instead of usually hold as RW_WRITER.
in addition the inode's i_contents lock is also held as RW_READER.

otherwise the "regular" directio code path is taken
via ufs_write()->wrip()->ufs_directio_write()

hth
-- 
frankB

(I'd rather be a forest then a street....)
_______________________________________________
ufs-discuss mailing list
ufs-discuss@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic