[prev in list] [next in list] [prev in thread] [next in thread] 

List:       yaffs
Subject:    Re: [Yaffs] Slow yaffs_readdir(..)
From:       Luc Van Oostenryck <lkml () looxix ! net>
Date:       2005-07-30 23:08:57
Message-ID: 42EC0889.9080009 () looxix ! net
[Download RAW message or body]

Charles Manning wrote:
> On Friday 01 July 2005 10:41, Charles Manning wrote:
> 
> > On Thursday 30 June 2005 17:43, Beat Morf wrote:
> > 
> > > Hi
> > > 
> > > I am using YAFFS direct implementation and actually testing some
> > > basically requirements.
> > > 
> > > When I write hundreds of files into the root of my YAFFS-FS (let's say
> > > 600) and dump them afterwards ('yaffs_opendir()', 'yaffs_readdir()'), I
> > > saw, that the 'yaffs_readdir(...)' functions needs much more time for
> > > the last files than for the first one.
> > > 
> > > Each call to the 'yaffs_readdir(...)' function will start at the first
> > > object within the 'yaffs_DIR'. For knowing which object was allready
> > > returned, a separate list of allready showed objects is applied (the
> > > objects are identifiable with the ObjectID); Means long times for a lot
> > > of files!
> > > 
> > > What is exactly the reason for such a method if I don't need to give
> > > them out in an special ordered way?
> > > 
> > > Actually I am using following 'yaffs_xxxxdir(...)' functions with a good
> > > performance ('dsc->list' is used for pointing to the next
> > > yaffsfs_ObjectListEntry entry):
> > 
> > Thanx Beat
> > 
> > I have had this comment once before from someone using huge directories.
> > I believe they got around the problem by caching names in their
> > application.
> > 
> > There are some interesting issues associated with directory reading. For
> > example, what happens if another thread is writing new files to the
> > directory or deleting files from the directory?
> > 
> > The easiest way to do a directory traversal is to keep a pointer to the
> > next sibling in the list as we search, but that can go wrong if that object
> > gets deleted before you read it:
> > 
> > eg.
> > 
> > 1. Directory has files xx,yy zz.
> > 2. Read dir, get xx. Pointer points to yy
> > 3. Delete yy
> > 4. Now pointer points to a non-existant object!
> > 
> > Another way that does not work is to try remember the number where you are:
> > eg.
> > 1. Directory has files xx,yy zz.
> > 2. Read dir, get xx. Remember we're pointing to item 1  (starting at zero)
> > 3. Delete xx
> > 4. Now item 1 is zz and we skipped past yy.
> > 
> > 
> > The code, as is, covers for cuveballs like this but it isn't very
> > efficient. This is something that I should look at improving.
> > 
> > 
> > I have not looked at your code suggestion in detail yet, but I shall.
> > 
> > -- Charles
> 
> 
> Having looked at Beat's code I think it would fail under one of the delete 
> case I mention.
> 
> I spent some time this weekend trying to implement some code using a balanced 
> AVL tree instead of a linked list. This would possibly improve the speed but 
> things would still degrade with large directories.
> 
> I'm now thinking of a scheme that is like Beat'ssuggestion, but is delete 
> safe.
> 
> Essentially, this will use a pointer into the directory (same as Beat's), but 
> it would also add a check to any file deletion from a directory to check 
> whether the file is currently being pointed to, and if so move it on to the 
> next item in the directory.
> 
> Ideas anybody?
> 
> -- Charles
> 

I've I looked at the sysfs code where they have basicaly the same situation:
the directories are implemented as a linked list of entries.

The solution they have adopted is to associate to each *opened* directory a small
bit of data (a "cursor") which is essentialy a pointer to the current entry in the \
linked list, so a readdir of a whole directory is linear (in yaffs, for each \
invocation of the readdir method, which retrieve one entry at a time, we need to \
lookup the list from the beginning at each time and thus reading a whole directory \
have a quadratic cost).

I think this solution is simple and clean; one of these day I will try a proof of \
concept of this.

Note that in reality, the sysfs code is more complicated that I describe above, they \
do their own dirent allocation (I'm not sure why they need this), it need a little \
more code each time an entry is added/removed from the list and when seeking a \
directory.


Luc


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic