[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-hackers
Subject:    Help with determining a system hang
From:       Patrick Mahan <mahan () mahan ! org>
Date:       2010-11-28 16:17:52
Message-ID: 4CF280B0.6000100 () mahan ! org
[Download RAW message or body]

Good day,

I am running a FreeBSD 8.0 kernel with my code in the kernel that does
some deep packet diving.  This is mostly working, but I am having occasional
system hangs.  No response to the console keyboard, stops receiving packets,
etc.

I have enabled INVARIANTS, WITNESS and WATCHDOG.  The watchdog fires (though not
always after the 20 sec wait, sometimes it fires immediately).  I also have
DDB and KDB enabled in the kernel.

What is puzzling me is when the watchdog fires and I get the DDB prompt, the
first thing I do is list all cpus: 'show allpcpus'.  I would expect to see
one of the CPUs having something happening, but most of the time all I see
is that all of the CPU's are idle.  The couple of times this was not true
the CPU showed it was in "em_handle_que" in dev/e1000/if_em.c.  But this code
is pretty straight forward, though I could see if it would block on reading
it's registers.

Can anyone give me a suggestion on possible causes?  At first I thought that
maybe I was having a deadlock issue with my code, but while WITNESS does report
a few lock-order reversals, they are not in my code and seem to be false
positives.  I next looked for some type of resource wait, but cannot find
one (or I don't know how to find it).

'show locks' does not show any locks being held.

'show threads' shows almost every thread sitting in an idle state.

I am at a loss to explain it.  I know it is probably my code that is causing this
behavior in some way because I never seen the hang when my code is bypassed.

When I do the packet diving, I am getting called in either ip_input() or
ip_output() directly.  In ip_input() I get called either in the forwarding path
or just before calling the upper protocol layer via the protosw.

In ip_output(), I get called just before ip_output() deals with IP fragmentation.

This is a Intel Xeon that FreeBSD reports as a 8 CPUS (duo core + 4
threads/core).  However, I am more experienced in MIPS hardware than
Intel.  I have not yet dug into the interrupt handling for the Intel in
FreeBSD, but it is one of my suspects since the system is not even responding
to the console keyboard.

This is going to be a learning experience for me :-)

Thanks for any and all help,

Patrick Mahan
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic