[prev in list] [next in list] [prev in thread] [next in thread] 

List:       netbsd-tech-kern
Subject:    Re: RFC (reassign)buf and carvinf up buffers (was Re: SCSI MMC device abstraction and UDF patch for
From:       Bill Studenmund <wrstuden () NetBSD ! org>
Date:       2005-12-29 21:45:52
Message-ID: 20051229214552.GD14308 () netbsd ! org
[Download RAW message or body]


On Thu, Dec 29, 2005 at 09:47:25PM +0100, Reinoud Zandijk wrote:
> On Thu, Dec 29, 2005 at 09:58:51AM -0800, Bill Studenmund wrote:
> > > That implies having a VOP_BMAP() figuring this out. Since UDF can't use a 
> > > VOP_BMAP this way (due to write shuffling) it would mean that VOP_BMAP 
> > > needs to distinguish between read and write requests and for read-request 
> > > try to figure out how much it can read in one go... quite expensive and 
> > > locking trouble prone.
> > 
> > This does not imply VOP_BMAP() figuring this out.
> > 
> > The file system decides what data goes into what buffers. The file system 
> > knows what blocks are where. Thus you don't have to figure all of this out 
> > in the middle of your strategy routine, you can figure it out when you 
> > make the buffers in the first place.
> > 
> > More directly, you SHOULD figure it out before your strategy routine.
> 
> Since UDF uses genfs, genfs decides the number of blocks to request by the 
> `runp' variable set by its VOP_BMAP() call to the filingsystem. Since UDF's 
> bmap is a 1:1 translation it allways returns the maximum runlength with
> 
>    *runp = MAXPHYS / lb_size;
> 
> to make full use of long extents to read to reduce the number of 
> transactions as much as possible. Note that this isn't happening yet but 
> thats the idea behind it. If i otoh return 0 or 1 i get lb_size or 
> 2*lb_size. So prolly i'll have to substract 1 from the *runp assignment :)

You should return the number of blocks that are contiguous. If the next 
MAXPHYS are all together, return the runp above. If not, return less.

> > No, a VOP_STRATEGY() call does NOT represent a read/write that has nothing 
> > to do with disk mapping, it represents a read or write of a buffer. Said 
> > buffer represents an extent on disk. One extent. If you have multiple 
> > extents in your transfer, you are dealing with multiple buffers.
> 
> true the read or write of a buffer that is created by genfs. So i allways 
> have to return a runlength of one then and loosing all hope on multi-sector 
> reads?

Actually look at your metadata layout. You (the fs) know where the blocks 
are. You should know how many blocks are contiguous for the passed-in 
offset. If you don't have MAXPHYS / lb_size worth of data in a row, return 
the amount you have. If you do have (MAXPHYS / lb_size) worth, return 
MAXPHYS / lb_size.

The test should be a simple conditional. It shouldn't be that hard. :-)

For sane files, you're always going to find a lot of blocks in a row. So 
most of the time you are going to do (MAXPHYS / lb_size) blocks.

If you really have a file that is significantly fragmented, then you have 
a severe performance issue. The time it takes to figure all of this out in 
VOP_BMAP() will be next to nothing compared to disk access time. So while 
you need to handle it, you don't need to worry about it performing well.

Take care,

Bill

[Attachment #3 (application/pgp-signature)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic