Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Nov 2019 11:13:41 -0700
From:      Ian Lepore <ian@freebsd.org>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>
Subject:   Re: can the hardware watchdog reboot a hung kernel?
Message-ID:  <8814791e9634980810a41b9cc229612e225a40ee.camel@freebsd.org>
In-Reply-To: <BEC1714A-2361-4B62-BEB9-82808920C269@cs.huji.ac.il>
References:  <EC4DB495-55D0-44BB-8D6A-0301785FADC7@cs.huji.ac.il> <9cded04a-9ae1-881e-3962-7ef0322e96ed@grosbein.net> <2AD912BF-97B0-421D-B561-722D74864DC9@cs.huji.ac.il> <828605fef472e04311c83a7de0d1f4df429ae717.camel@freebsd.org> <BEC1714A-2361-4B62-BEB9-82808920C269@cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2019-11-14 at 20:10 +0200, Daniel Braniss wrote:
> > On 14 Nov 2019, at 18:02, Ian Lepore <ian@freebsd.org> wrote:
> > 
> > On Thu, 2019-11-14 at 17:35 +0200, Daniel Braniss wrote:
> > > > On 14 Nov 2019, at 17:28, Eugene Grosbein <eugen@grosbein.net>
> > > > wrote:
> > > > 
> > > > 14.11.2019 21:52, Daniel Braniss wrote:
> > > > 
> > > > > hi,
> > > > > I have serveral hundred Nano-pi NEO running, and sometimes
> > > > > they
> > > > > hang, since there is no console
> > > > > available, the only solution is to do a power cycle - not so
> > > > > easy
> > > > > since they are distributed in three buildings :-)
> > > > > 
> > > > > I am looking at the watchdog stuff, but it seems that what I
> > > > > want
> > > > > is not supported, i.e.
> > > > > 	reboot the kernel when hung 
> > > > > 
> > > > > wishful thinking?
> > > > 
> > > > It's possible if the hardware has such a watchdog and kernel
> > > > subsystem watchdog(4) supports it.
> > > > rc.conf(5) manual page describes watchdogd_enable option.
> > > > 
> > > 
> > > yes, but it relys  on user land, what if the kernel is hung? 
> > > 
> > 
> > It relies on the userland daemon to issue the ioctl() calls to pet
> > the
> > dog.  If the kernel is hung, then userland code isn't going to run
> > either, and the watchdog petting won't happen, and eventually the
> > hardware reboots.
> > 
> > We use this at $work specifically to reboot if the kernel hangs,
> > using
> > this config:
> > 
> > watchdogd_enable=YES
> > watchdogd_flags="-s 16 -t 64 -x 64"
> > 
> > That says the daemon should pet the dog every 16 seconds, and the
> > hardware is programmed to reboot if 64 seconds elapses without
> > petting.
> > In addition, when watchdogd is shutdown normally (like during a
> > normal
> > system reboot) it doesn't disable the watchdog hardware, it sets
> > the
> > timeout to 64s to protect against any kind of hang during the
> > reboot. 
> > The -t and -x times can be different, 64s just happens to work well
> > for
> > us in both cases.
> > 
> > -- Ian
> > 
> 
> ok, that is very encouraging, now a last question
> how can i hang the kernel to test that the watchdog kicks in? apart
> from writing a kernel module :-)
>  

Drop into the kernel debugger and just let it sit there until it
reboots (or fails to, I guess).  Do "sysctl debug.kdb.enter=1".

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8814791e9634980810a41b9cc229612e225a40ee.camel>