Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jan 2013 23:22:37 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        Ian Lepore <ian@FreeBSD.org>
Cc:        "arch@freebsd.org" <arch@FreeBSD.org>
Subject:   Re: RFC: enhanced watchdog.
Message-ID:  <50FCECBD.9090002@mu.org>
In-Reply-To: <1358743064.32417.409.camel@revolution.hippie.lan>
References:  <201301190604.r0J64RbW009298@svn.freebsd.org> <50FA3D36.4080709@mu.org> <1358743064.32417.409.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1/20/13 8:37 PM, Ian Lepore wrote:
> On Fri, 2013-01-18 at 22:29 -0800, Alfred Perlstein wrote:
>> We at iX are trying to enhance the watchdog and we think some of the
>> changes may benefit the community as a whole.
>>
>> Basically we want to make it easy for developers to prototype watchdog
>> scripts in a "test-only" mode that basically logs if the watchdog had
>> failed.
>>
>> I have most of the code done, but could really use help on two things:
>>
>> 1) review
>> 2) suggestion for inserting the warning messages from the userland
>> watchdogd into the kernel message buffer.
>> 3) suggestion for logging/warning of pending death.
>>
>> In detail:
>> 1) The reason for review should be obvious, we want to make sure that
>> this works for everyone.
>> 2) The reason for inserting messages into the kernel log is because that
>> is the easiest place for us to recover the diagnostics when we do have a
>> crash due to watchdog.  Maybe there is a smarter thing to do?
> I've recently wished for a way that a sufficiently-credentialed userland
> process could, in effect, kernel-printf.  I've been burned a number of
> times by init(8) failing to start up for various reasons such as
> no /dev, and it has no way to say what's wrong.  It's surprisingly hard
> to figure out what the problem is.
>
> For your need, a possiblity I guess would be to have the watchdog device
> do it for you, since you're already talking to it.  Who knows, maybe
> some special watchdog hardware would be able to do something useful with
> a short message.  I've worked with hardware that has a few registers
> designed to survive a reboot, for communicating with your reincarnated
> self; nothing big enough for arbitrary strings yet, but hardware just
> keeps getting cooler all the time.
I'm almost wondering if there's some kind of /dev/klog we should/could have?


>
>> 3) What is a good way to warn of impeding death?  I was thinking of just
>> another thread in the process that would be signalled before the
>> watchdog script was run and would log when the timer is about to expire
>> or based on a configurable threshold.
>>
> SIGALRM that fires shortly before death?
That sounds great.  I'll look into that.

>
>> Finally, there is some thought about adding a kernel daemon to the
>> watchdog facility that would allow us to strobe watchdogs with low max
>> values while our userland watchdog was polling the system.
>>
>> Why??? Well because the ICH driver has a max timeout of ~2 minutes.  We
>> really want to be able to leverage this watchdog, but also go higher
>> than this.  The way to do this is to drive the system almost like a step
>> up electrical relay.
>>
> I very much like this.  A new ARM SoC I'm about to start working with
> has a max 16 second watchdog, and I'm afraid things like firmware
> updaters might lock out userland for longer than that on such a wimpy
> chip.
>
>> [... code ...]
> I skimmed through the code, but it's been a long day of reading code for
> me, so I'm not gonna pretend it was a thorough review.  The main thing
> that popped out at me was 'carp'.  Shouldn't a watchdog bark? :)
ha!


>
> I'm also curious why you chose CLOCK_UPTIME_FAST, which I'm not familiar
> with (gonna be reading a manpage in a minute).  Not knowing about some
> of the newer choices, I probably would've used CLOCK_MONOTONIC.
I unfortunately am a generalist and my clock-fu is weak.  I can look 
into switching to that.

What would be the difference between the two in general?

-Alfred



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50FCECBD.9090002>