Date: Wed, 27 Dec 2017 15:18:53 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: karels@FreeBSD.org, freebsd-arch@freebsd.org Subject: Re: making SW_WATCHDOG dynamic Message-ID: <a522c434-27ec-3d20-86c7-957bb5016bdb@FreeBSD.org> In-Reply-To: <201712261425.vBQEPMmQ007578@mail.karels.net> References: <201712261425.vBQEPMmQ007578@mail.karels.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 26/12/2017 16:25, Mike Karels wrote: > There is a kernel option, SW_WATCHDOG, which adds a low-level software > watchdog in hardclock. By default, the kernel and watchdogd support > only hardware-based watchdogs. There is also a callout-based software > watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog > is specified, but watchdogd doesn't switch on its own. The SW_WATCHDOG > option adds a lower-level software watchdog to the hardware-based mechanism, > but it adds it unconditionally. I propose to include the SW_WATCHDOG > facility by default, but enable it only if there is no hardware watchdog. I think that this is a good idea. Although, I would not necessarily tie the software watchdog to not having any hardware watchdog. This is probably a good default policy, but I would allow to enable / disable the software watchdog explicitly (e.g. via a sysctl). I also think that we should support enabling several watchdog timers with different timeouts. Each of them can serve a different purpose. E.g., a software or hardware NMI-sending watchdog can be used to get diagnostic data out of a hung system while a resetting watchdog can be used to ensure fail-safe operation. > I'm interested in any comments, suggestions, or background; feel free to > mail me off the list. If there are multiple people interested, I'll > forward messages to that group. > > I want to make the change because I have found SW_WATCHDOG quite useful > at $JOB, and it's annoying to have to build a custom kernel just for this > (not just once, but every time there is a kernel patch). Makes sense. > Also, I'm curious why we have two software watchdog facilities. The > --softwatchdog facility has various options on expiration, such as > printf/log/panic; I don't know why anything other than panic/reboot > would be desirable, though. I already contacted some of the people who > have left fingerprints on watchdog. Also, if anyone wants to review > the code, let me know. I guess that the second software watchdog was added to achieve what I suggested above. Of course, it would have been nicer to re-use SW_WATCHDOG for that purpose and to add a more generic support for configuring multiple watchdog timers with different timeouts. But I guess that adding a new single-purpose software watchdog was much easier to do. P.S. And maybe just using the second software watchdog would be good enough for what you are doing? -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a522c434-27ec-3d20-86c7-957bb5016bdb>