Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Dec 2017 15:18:53 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        karels@FreeBSD.org, freebsd-arch@freebsd.org
Subject:   Re: making SW_WATCHDOG dynamic
Message-ID:  <a522c434-27ec-3d20-86c7-957bb5016bdb@FreeBSD.org>
In-Reply-To: <201712261425.vBQEPMmQ007578@mail.karels.net>
References:  <201712261425.vBQEPMmQ007578@mail.karels.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 26/12/2017 16:25, Mike Karels wrote:
> There is a kernel option, SW_WATCHDOG, which adds a low-level software
> watchdog in hardclock.  By default, the kernel and watchdogd support
> only hardware-based watchdogs.  There is also a callout-based software
> watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog
> is specified, but watchdogd doesn't switch on its own.  The SW_WATCHDOG
> option adds a lower-level software watchdog to the hardware-based mechanism,
> but it adds it unconditionally.  I propose to include the SW_WATCHDOG
> facility by default, but enable it only if there is no hardware watchdog.

I think that this is a good idea.  Although, I would not necessarily tie the
software watchdog to not having any hardware watchdog.  This is probably a good
default policy, but I would allow to enable / disable the software watchdog
explicitly (e.g. via a sysctl).

I also think that we should support enabling several watchdog timers with
different timeouts.  Each of them can serve a different purpose.  E.g., a
software or hardware NMI-sending watchdog can be used to get diagnostic data out
of a hung system while a resetting watchdog can be used to ensure fail-safe
operation.

> I'm interested in any comments, suggestions, or background; feel free to
> mail me off the list.  If there are multiple people interested, I'll
> forward messages to that group.
> 
> I want to make the change because I have found SW_WATCHDOG quite useful
> at $JOB, and it's annoying to have to build a custom kernel just for this
> (not just once, but every time there is a kernel patch).

Makes sense.

> Also, I'm curious why we have two software watchdog facilities.  The
> --softwatchdog facility has various options on expiration, such as
> printf/log/panic; I don't know why anything other than panic/reboot
> would be desirable, though.  I already contacted some of the people who
> have left fingerprints on watchdog.  Also, if anyone wants to review
> the code, let me know.

I guess that the second software watchdog was added to achieve what I suggested
above.  Of course, it would have been nicer to re-use SW_WATCHDOG for that
purpose and to add a more generic support for configuring multiple watchdog
timers with different timeouts.  But I guess that adding a new single-purpose
software watchdog was much easier to do.

P.S.
And maybe just using the second software watchdog would be good enough for what
you are doing?

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a522c434-27ec-3d20-86c7-957bb5016bdb>