[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-hackers
Subject:    Re: NMI watchdog functionality on Freebsd
From:       John Baldwin <jhb () freebsd ! org>
Date:       2013-01-24 16:11:01
Message-ID: 201301241111.01629.jhb () freebsd ! org
[Download RAW message or body]

On Wednesday, January 23, 2013 11:57:33 am Ian Lepore wrote:
> On Wed, 2013-01-23 at 08:47 -0800, Matthew Jacob wrote:
> > On 1/23/2013 7:25 AM, John Baldwin wrote:
> > > On Tuesday, January 22, 2013 5:40:55 pm Sushanth Rai wrote:
> > >> Hi,
> > >>
> > >> Does freebsd have some functionality similar to  Linux's NMI watchdog ? 
I'm
> > > aware of ichwd driver, but that depends to WDT to be available in the
> > > hardware. Even when it is available, BIOS needs to support a mechanism 
to
> > > trigger a OS level recovery to get any useful information when system is
> > > really wedged (with interrupt disabled)
> > The principle purpose of a watchdog is to keep the system from hanging. 
> > Information is secondary. The ichwd driver can use the LPC part of ICH 
> > hardware that's been there since ICH version 4. I implemented this more 
> > fully at Panasas. The first importance is to keep the system from being 
> > hung. The next piece of information is to detect, on reboot, that a 
> > watchdog event occurred. Finally, trying to isolate why is good.
> > 
> > This is equivalent to the tco_WDT stuff on Linux. It's not interrupt 
> > driven (it drives the reset line on the processor).
> > 
> 
> I think there's value in the NMI watchdog idea, but unless you back it
> up with a real hardware watchdog you don't really have full watchdog
> functionality.  If the NMI can get the OS to produce some extra info,
> that's great, and using an NMI gives you a good chance of doing that
> even if it is normal interrupt processing that has wedged the machine.
> But calling panic() invokes plenty of processing that can get wedged in
> other ways, so even an NMI-based watchdog isn't g'teed to get the
> machine running again.
> 
> But adding a real hardware watchdog that fires on a slightly longer
> timeout than the NMI watchdog gives you the best of everything: you get
> information if it's possible to produce it, and you get a real hardware
> reset shortly thereafter if producing the info fails.

The IPMI watchdog facility has support for a pre-interrupt that fires before 
the real watchdog.  I have coded up support for it in a branch but haven't 
found any hardware that supports it that I could use to test them.  However, 
you could use an NMI pre-timer via the local APIC timer as a generic pre-timer 
for other hardware watchdogs.

-- 
John Baldwin
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic