'Re: Strange panic on ppc64'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-ppc
Subject:    Re: Strange panic on ppc64
From:       Justin Hibbits <jhibbits () freebsd ! org>
Date:       2013-11-12 21:51:51
Message-ID: CAHSQbTA8vCgiYwxDQ7J-w4K_vXEUr2a20tLYOBQXLq=cb=OggQ () mail ! gmail ! com
[Download RAW message or body]

On Nov 12, 2013 1:47 PM, "Konstantin Belousov" <kostikbel@gmail.com> wrote:
>
> On Tue, Nov 12, 2013 at 01:13:28PM -0800, Justin Hibbits wrote:
> > On Tue, Nov 12, 2013 at 12:51 PM, Konstantin Belousov
> > <kostikbel@gmail.com>wrote:
> >
> > > On Tue, Nov 12, 2013 at 08:32:31AM -0800, Justin Hibbits wrote:
> > > > The log is attached.  I'm not sure what exactly is going on here.
 The
> > > > conditions were: building something on zfs, while also accessing
files
> > > over
> > > > NFS.  It seems each of those individually is fine, but doing both it
> > > brings
> > > > my system down.  I _think_ the actual panic message (recursed on
> > > > non-recursive mutex) is a red herring, since it already trapped in
the
> > > > kernel, twice.  Any clues?  It's 100% reproducible by me.
> > > >
> > > This does not seems related to NFS or ZFS proper.  What happens is
> > > that tc_windup() executing in the interupt context decided to enter
> > > a debugger.  I am not sure why the debugger is entered.
> > >
> > > Apart from this, the situation is clear:
> > > the interrupt happens while the referenced mutex was owned. The
debugger
> > > is entered, and tries to read a char from keyboard, which is USB. For
> > > USB to function, it has to access a lot of the kernel services, in
> > > particular, busdma, which, in turn, requires some pmap calls, and you
> > > end up accessing the same mutex.
> > >
> > > The bug there is that code executed from interrupt or debugger context
> > > must not lock mutexes, or generally, call into top-half of the kernel
> > > (now top half is essentially the whole kernel).  I am not sure if
> > > USB could ever work in such mode.
> > >
> >
> > I discussed this with Nathan on IRC earlier.  You're right that it's not
> > related to NFS nor ZFS, at least not directly.  It's actually most
likely a
> > stack overflow, since currently there are only 4 pages for stack, so
when
> > it takes the DECR trap it ends up blowing the stack.  This is only made
> > evident because ZFS is very stack hungry.  I'm upping the stack to 8
pages,
> > and testing tonight.
> >
> > As for your assessment of the situation, you're spot on, and I have no
idea
> > how to properly fix it.
>
> For stack overflow, I would not see the frames I talked about.
> The panic clearly states that you get a recursion on mutex, and sleepable
> mutex must not be locked from the interrupt or debugger context.

Right, it is two issues. It entered the debugger with a stack overflow,
then panicked on a mutex recursion. I'm addressing the stack overflow.
_______________________________________________
freebsd-ppc@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread]