'Re: [Ocfs2-devel] [PATCH] fs/ocfs2: fix race in ocfs2_dentry_attach_lock'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-devel
Subject:    Re: [Ocfs2-devel] [PATCH] fs/ocfs2: fix race in ocfs2_dentry_attach_lock
From:       Changwei Ge <gechangwei () live ! cn>
Date:       2019-05-07 3:29:35
Message-ID: HK2PR02MB39247F75C88774D6AA6DCC04D5310 () HK2PR02MB3924 ! apcprd02 ! prod ! outlook ! com
[Download RAW message or body]

Hi Wengang,

You indeed shared us a wonderful analysis  :)

I understand the the race now, but still have some thoughts on your 
draft patch. Please check out them inline.

On 2019/5/7 1:55 上午, Wengang Wang wrote:
> Hi Changwei,
>
> Actually I'd like to put the details after I confirmed the patch fixes 
> (I didn't get that information yet so far).
> But since you are interested, I am putting it here.
>
> The in core dlm_lock_resource and the dlm_locks in the granted list 
> have the following looking:
>
> <struct dlm_lock_resource 0xffff9c7e9ad57cc0>
> ---------------------------------------------------------------------------------------------------> 
>
>     lockname                      N000000000064e69f009a7bb5 hex: 
> 4e303030303030303030303634653639660000009a7bb500000
>     owner                         2
>     dlm                           34F9A5BE6F2A481B8DE25ABAE8BBC68D
>     state                         NOSTATE
>     *dlm_locks in granted list:
>         <dlm_lock 
> 0xffff9c7e0429d700>-------------------------------------------------------------->
>           ml.node                       1
>           lockres                       0xffff9c7e9ad57cc0
>           refs                          2
>           lksb ffff9c7e05093720/ containing ocfs2_lock_res 
> ffff9c7e050936a0
>           convert_pending               0
>           ml.list                       0
>           ml.flags                      0
>           type   3   convert_type   -1
>           ml.highest_blocked            -1
> <------------------------------------------------------------------------------------------- 
>
>         <dlm_lock 
> 0xffff9c7e02f3ee80>-------------------------------------------------------------->
>           ml.node                       0
>           lockres                       0xffff9c7e9ad57cc0
>           refs                          2
>           lksb ffff9c7e30658a20/ containing ocfs2_lock_res 
> ffff9c7e306589a0
>           convert_pending               0
>           ml.list                       0
>           ml.flags                      0
>           type   3   convert_type   -1
>           ml.highest_blocked            -1
> <------------------------------------------------------------------------------------------- 
>
>         <dlm_lock 
> 0xffff9c7d3ef3fd80>-------------------------------------------------------------->
>           ml.node                       2
>           lockres                       0xffff9c7e9ad57cc0
>           refs                          2
>           lksb ffff9c7dbffe2898/ containing ocfs2_lock_res 
> ffff9c7dbffe2818
>           convert_pending               0
>           ml.list                       0
>           ml.flags                      0
>           type   3   convert_type   -1
>           ml.highest_blocked            -1
> <------------------------------------------------------------------------------------------- 
>
>         <dlm_lock 
> 0xffff9c7e41c97880>-------------------------------------------------------------->
>           ml.node                       2
>           lockres                       0xffff9c7e9ad57cc0
>           refs                          2
>           lksb ffff9c7dbffad298/ containing ocfs2_lock_res 
> ffff9c7dbffad218
>           convert_pending               0
>           ml.list                       0
>           ml.flags                      0
>           type   3   convert_type   -1
>           ml.highest_blocked            -1
> <------------------------------------------------------------------------------------------- 
>
>         4 dlm_lock in granted list
>
> You see there are two granted dlm_lock for node 2 (0xffff9c7d3ef3fd80 
> and 0xffff9c7e41c97880).
> Normally, ocfs2 FS layer wouldn't request duplicated new lock against 
> same lock resource, It would request coverts instead after the first 
> new lock request.
> So the two existence of dlm_lock for the same lock resource is a hint 
> for ocfs2 FS layer racing.

I agree with your opinion that locking against to the same lock resource 
(shared inode, etc) should only have a unique ocfs2 lock resource object.

So do you think we should add some status check in dlm layer to assert 
insane cluster locking logic?  For instance, just check if a node wants 
to be granted twice, which is obviously unacceptable and unnecessary.

>
> From lksb, we know the containing ocfs2_lock_res objects 
> (ffff9c7dbffe2818 and ffff9c7dbffad218)
>
> crash> ocfs2 struct ocfs2_lock_res ffff9c7dbffe2818 |egrep 
> "holder|flag|name"
>     l_name                        N000000000064e69f009a7bb5 hex: 
> 4e303030303030303030303634653639660000009a7bb500000
>     l_flags |OCFS2_LOCK_ATTACHED|OCFS2_LOCK_INITIALIZED
>     l_ro_holders                  1
>     l_ex_holders                  0
> crash> ocfs2 struct ocfs2_lock_res ffff9c7dbffad218 |egrep 
> "holder|flag|name"
>     l_name                        N000000000064e69f009a7bb5 hex: 
> 4e303030303030303030303634653639660000009a7bb500000
>     l_flags |OCFS2_LOCK_ATTACHED|OCFS2_LOCK_INITIALIZED
>     l_ro_holders                  0
>     l_ex_holders                  0
>
> The one @ffff9c7dbffad218 is the victim with l_ro_holdrs being 0.
>
> I think the racing is like this:
>
>                 thread A                                  thread B
>
>   (A1) enter ocfs2_dentry_attach_lock,
>   seeing dentry->d_fsdata is NULL,
>   and no alias found by
>   ocfs2_find_local_alias,
>   kmalloc a new ocfs2_dentry_lock
>   structure to local variable "dl", dl1
>
>               .....
>
> (B1) enter ocfs2_dentry_attach_lock,
> seeing dentry->d_fsdata is NULL,
> andno alias found by
> ocfs2_find_local_alias so kmalloc
>                                                                 a new 
> ocfs2_dentry_lock
> structure to local variable "dl", dl2
>
> ......
>
>  (A2) set dentry->d_fsdata with dl1
>
>              ......
>
>  (A3) ocfs2_dentry_lock() succeeded
>  and dl1->dl_lockres.l_ro_holders
>   increased to 1.
>              ......
>
> (B2) set dentry->d_fsdata with dl2
>
> .......
>
> (B3) ocfs2_dentry_lock() succeeded
>                                                                and 
> dl2->dl_lockres.l_ro_holders
> increased to 1.
>
> ......
>
> (A4) ocfs2_dentry_unlock() called
> and dl2->dl_lockres.l_ro_holders
> decreased to 0.
>
>             ....
>
> (B4) ocfs2_dentry_unlock() called,
> decreasing dl2->dl_lockres.l_ro_holders,
> but see it's zero now, panic
>
>
> The point is that for thread A, ro_holders increased on dl1 at step A3 
> but decreased on dl2 at step A4. This causes dl1 has ro_holders left 
> 1. It also caused and extra and unexpected decreasing on dl2 making 
> the ro_holders 0.
> Step B2 caused a memory leak (but not the topic here). Step B4 hit the 
> panic on seeing ro_holders on dl2 is zero.
>
> Go back to the vmcore, let's find out dl1 and dl2:
>
> crash> kmem ffff9c7dbffe2818
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
> ffff9c7e97c03080      512      36007     36128   1129    16k kmalloc-512
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>   ffffeeaec2fff800  ffff9c7dbffe0000     0     32         25     7
>   FREE / [ALLOCATED]
>   [ffff9c7dbffe2800]
>
>       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> ffffeeaec2fff880  bffe2000 dead000000000400        0  0 fffffc0000000
> crash> kmem ffff9c7dbffad218
> CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
> ffff9c7e97c03080      512      36007     36128   1129    16k kmalloc-512
>   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
>   ffffeeaec2ffeb00  ffff9c7dbffac000     0     32         23     9
>   FREE / [ALLOCATED]
>   [ffff9c7dbffad200]
>
>       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> ffffeeaec2ffeb40  bffad000 dead0000ffffffff        0  0 fffffc0000000
>
> crash> struct ocfs2_dentry_lock -x ffff9c7dbffe2800 |egrep "parent|inode"
>   dl_parent_blkno = 0x64e69f,
>   dl_inode = 0xffff9c7d13c624c0,
> crash> struct ocfs2_dentry_lock -x ffff9c7dbffad200 |egrep "parent|inode"
>   dl_parent_blkno = 0x64e69f,
>   dl_inode = 0xffff9c7d13c624c0,
>
> ffff9c7dbffe2800 is dl1, ffff9c7dbffad200 is dl2, they match well in 
> dl_parent_blkno and dl_inode fields.
>
> Verifying dentry:
> crash> struct dentry ffff9c7d8b659780 |grep 24c0
>   d_inode = 0xffff9c7d13c624c0,
> crash> struct dentry ffff9c7d8b659780 |grep fsdata
>   d_fsdata = 0xffff9c7dbffad200,
>
> d_fsdata matches dl2 well.
>
> Above is the analysis (hoping the format is still good after email 
> sent). Fix is self-explaining.
>
> Thanks,
> Wengang
>
> On 2019/5/5 0:15, Changwei Ge wrote:
>> Hi Wengang,
>>
>>
>> We actually met some strange problems around denty cluster lock
>> attachment before. I suppose solidifying around logic makes sense.
>>
>> Can you make your change log richer with a little more analysis?
>>
>> If you can paste a dlm lock resouce and ocfs2 lock resource snapshot
>> from coredump, that would be great and I can help figure what bad 
>> happened.
>>
>>
>> Thanks,
>>
>> Changwei
>>
>>
>>
>> On 2019/5/4 1:01 上午, Wengang Wang wrote:
>>> ocfs2_dentry_attach_lock() can be executed in parallel threads 
>>> against the
>>> same dentry. Make that race safe.
>>>
>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>> ---
>>>    fs/ocfs2/dcache.c | 27 +++++++++++++++++++++------
>>>    1 file changed, 21 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
>>> index 290373024d9d..980e571c163c 100644
>>> --- a/fs/ocfs2/dcache.c
>>> +++ b/fs/ocfs2/dcache.c
>>> @@ -227,12 +227,13 @@ int ocfs2_dentry_attach_lock(struct dentry 
>>> *dentry,
>>>                     struct inode *inode,
>>>                     u64 parent_blkno)
>>>    {
>>> -    int ret;
>>> +    int ret, new_dl = 0;
>>>        struct dentry *alias;
>>> -    struct ocfs2_dentry_lock *dl = dentry->d_fsdata;
>>> +    struct ocfs2_dentry_lock *dl, *dl_free_on_race = NULL;
>>> trace_ocfs2_dentry_attach_lock(dentry->d_name.len, dentry->d_name.name,
>>> -                       (unsigned long long)parent_blkno, dl);
>>> +                       (unsigned long long)parent_blkno,
>>> +                       dentry->d_fsdata);
>>>           /*
>>>         * Negative dentry. We ignore these for now.
>>> @@ -243,11 +244,15 @@ int ocfs2_dentry_attach_lock(struct dentry 
>>> *dentry,
>>>        if (!inode)
>>>            return 0;
>>>    +    spin_lock(&dentry_attach_lock);
>>>        if (d_really_is_negative(dentry) && dentry->d_fsdata) {
>>>            /* Converting a negative dentry to positive
>>>               Clear dentry->d_fsdata */
>>>            dentry->d_fsdata = dl = NULL;
>>> +    } else {
>>> +        dl = dentry->d_fsdata;
>>>        }
>>> +    spin_unlock(&dentry_attach_lock);
>>>

I don't understand why acquring dentry_attach_lock here. Is there any 
race you want to close here?

>>>           if (dl) {
>>>            mlog_bug_on_msg(dl->dl_parent_blkno != parent_blkno,
>>> @@ -310,10 +315,20 @@ int ocfs2_dentry_attach_lock(struct dentry 
>>> *dentry,
>>>       out_attach:
>>>        spin_lock(&dentry_attach_lock);
>>> -    dentry->d_fsdata = dl;
>>> -    dl->dl_count++;
>>> +    /* d_fsdata could be set by parallel thread */
>>> +    if (unlikely(dentry->d_fsdata)) {
>>> +        pr_err("race found in ocfs2_dentry_attach_lock 
>>> dentry=%p\n", dentry);
>>> +        if (!alias)
>>> +            dl_free_on_race = dl;
>>> +    } else {
>>> +        dentry->d_fsdata = dl;
>>> +        dl->dl_count++;
>>> +        if (!alias)
>>> +            new_dl = 1;
>>> +    }
>>>        spin_unlock(&dentry_attach_lock);
>>>    +    kfree(dl_free_on_race);

Do we also have to do something for 'dl->dl_inode = igrab(inode)' ?

And if we don't have to allocate new lock resource for racing thread, 
that would be better.

How about just make the racing one wait somehow?

Thanks,

Changwei

>>>        /*
>>>         * This actually gets us our PRMODE level lock. From now on,
>>>         * we'll have a notification if one of these names is
>>> @@ -330,7 +345,7 @@ int ocfs2_dentry_attach_lock(struct dentry *dentry,
>>>         * We need to do this because error here means no 
>>> d_instantiate(),
>>>         * which means iput() will not be called during dput(dentry).
>>>         */
>>> -    if (ret < 0 && !alias) {
>>> +    if (ret < 0 && new_dl) {
>>>            ocfs2_lock_res_free(&dl->dl_lockres);
>>>            BUG_ON(dl->dl_count != 1);
>>>            spin_lock(&dentry_attach_lock);
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[prev in list] [next in list] [prev in thread] [next in thread]