[prev in list] [next in list] [prev in thread] [next in thread] 

List:       scst-devel
Subject:    Re: [Scst-devel] 3.4.x Hung Tasks
From:       Bart Van Assche <bvanassche () acm ! org>
Date:       2020-11-12 3:45:33
Message-ID: 6c9edbce-6a24-1cf0-a5a9-5f048bf688f7 () acm ! org
[Download RAW message or body]

On 11/11/20 7:36 AM, Marc Smith wrote:
> On Sun, Nov 8, 2020 at 10:06 PM Marc Smith <msmith626@gmail.com> wrote:
>> On Thu, Nov 5, 2020 at 11:19 PM Bart Van Assche <bvanassche@acm.org> wrote:
>>> On 11/4/20 11:35 AM, Marc Smith wrote:
[ ... ]
>>>> [ 4597.468834] Call Trace:
>>>> [ 4597.468837]  __schedule+0x46e/0x4b5
>>>> [ 4597.468839]  ? __switch_to_asm+0x40/0x70
>>>> [ 4597.468841]  ? __switch_to_asm+0x34/0x70
>>>> [ 4597.468844]  schedule+0x67/0x81
>>>> [ 4597.468846]  rwsem_down_read_slowpath+0x292/0x2f1
>>>> [ 4597.468848]  ? __switch_to_asm+0x34/0x70
>>>> [ 4597.468852]  ? __switch_to+0x2a7/0x354
>>>> [ 4597.468855]  dlm_lock+0x82/0x183
> 
> (gdb) list *(dlm_lock+0x82)
> 0xffffffff8123cb1f is in dlm_lock (fs/dlm/lock.c:3432).
> 
>         dlm_lock_recovery(ls);
> 
> static inline void dlm_lock_recovery(struct dlm_ls *ls)
> {
>         down_read(&ls->ls_in_recovery);
> }
> 
> This is 'static inline' so we don't see it in the call trace? Right? I
> see rwsem_down_read_slowpath() above it so I assume if I followed
> down_read() I would see that.
> 
> 
>>>> [ 4597.468866]  ? scst_dlm_post_ast+0x1/0x1 [scst]
>>>> [ 4597.468868]  ? usleep_range+0x7a/0x7a
>>>> [ 4597.468871]  ? schedule+0x67/0x81
>>>> [ 4597.468872]  ? schedule_timeout+0x2c/0xe5
>>>> [ 4597.468882]  scst_dlm_lock_wait+0x72/0x10a [scst]
> 
> (gdb) list *(scst_dlm_lock_wait+0x72)
> 0x22ade is in scst_dlm_lock_wait
> (/sources/scst-3.4.x_r9170/scst/src/scst_dlm.c:95).
> 
> So this indicates we are here:
>         res = dlm_lock(ls, mode, &lksb->lksb, flags,
>                             (void *)name, name ? strlen(name) : 0, 0,
>                             scst_dlm_ast, lksb, bast);
> 
> Is it possibly getting stuck in dlm_lock() itself? I understand it's
> async and supposed to return immediately, but perhaps something is
> wrong in dlm_lock()? Or something seriously wrong on my machine when
> this happens. =)

Hi Marc,

That might be what is going on. dlm_lock_recovery() is indeed not
visible in the call trace because it has been inlined. But
rwsem_down_read_slowpath() is visible in the call trace what means that
dlm_lock() is waiting for down_read() to finish. Is it possible to
reproduce this hang with lockdep enabled? If so, does lockdep provide
more information about the context that is holding the lock too long?

I have verified the v5.9 DLM source code with a static analyzer but that
did not yield any interesting results:

make M=fs/dlm W=1 C=2 CHECK="smatch -p=kernel"

Thanks,

Bart.


_______________________________________________
Scst-devel mailing list
https://lists.sourceforge.net/lists/listinfo/scst-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic