[prev in list] [next in list] [prev in thread] [next in thread]
List: lustre-devel
Subject: [Lustre-devel] layout lock bug with 118k
From: jacques-charles.lafoucriere () cea ! fr (Jacques-Charles Lafoucriere)
Date: 2010-10-30 10:48:14
Message-ID: 4CCBF7EE.2070904 () cea ! fr
[Download RAW message or body]
On 10/29/2010 05:26 PM, Andreas Dilger wrote:
> On 2010-10-27, at 21:18, Jacques-Charles Lafoucriere wrote:
>
> > I have found a bug in layout lock (the bug was seen with test 118k, this is the \
> > last known).
> > A simpler reproducer is to make an rm during a long file write.
> >
> > A lock timeout is trigged because during the writes the client hold the layout \
> > lock which is in the same lock as a lookup (muliple inode_bits in the same lock). \
> > So when the MDS try to get an LCK_EX on the object (before calling mdo_unlink), \
> > the lock is not freed because of the ref count.
> The client should only be holding a reference on the layout lock for 1MB chunks of \
> IO. Between each IO the layout lock reference should be dropped, and if there was \
> a blocking callback on the lock the client should also cancel the lock at that \
> time.
>
The client hold the layout lock only around the IO. So between I/O's,
the lock should be canceled. The issue comes from that the same lock is
also referenced because of the other inodes bits.
> > A solution is the request a LCK_CR on the object before the mdo_unlink (the \
> > directory is still protected by a strong lock). Is it a good solution ? Do you \
> > have another one ?
> We discussed this issue recently, and the preferred solution is to release the \
> layout lock as soon as the OST extent locks are referenced, since we don't actually \
> require the layout lock once we hold the object extent lock(s).
> We discussed this before, and it is a bit tricky, because the ll_layout_lock_get() \
> and ll_layout_lock_put() currently wrap the IO function. One proposal is to \
> refcount the lsm structure under the layout lock, and then drop the last lsm \
> reference in the LOV code after the object lock is held, and that would release the \
> lsm lock.
I will see how to do this
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic