Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Nov 2002 19:21:08 -0500 (EST)
From:      Jeff Roberson <jroberson@chesapeake.net>
To:        arch@freebsd.org
Subject:   Software Watchdog
Message-ID:  <20021115191632.U22491-100000@mail.chesapeake.net>

next in thread | raw e-mail | index | archive | help
Sean Kelly has implemented a software watchdog based on input from myself
and Peter.  This works through a simple watcdog daemon that checks in with
the kernel every so often.  The kernel complains via hardclock() if the
watchdog times out.  This will be very useful for debugging hard lockups
because hardclock() comes in through a fast intr.  There are few things
that will stop hardclock() from firing.

Below I have included some snipits from an email Sean sent me.

Here's what I've got so far:

1. Kernel watchdog
   a. Three sysctls
      i.   debug.watchdog.timeout: Number of seconds allowed to go without
           a reset
      ii.  debug.watchdog.reset: Upon read or write, resets the watchdog
           timer
      iii. debug.watchdog.enabled: When >0, perform watchdog checks.
   b. 'options WATCHDOG' or 'options INVARIANTS' to compile with watchdog
      code
   c. watchdog(4) manpage
2. Userland support
   a. /usr/sbin/watchdogd
      i.   Performs stat("/etc") test
      ii.  Awakens periodically and resets watchdog via d.w.reset sysctl
      iii. Sets d.w.enabled=1 on start and d.w.enabled=0 on exit.
      iv.  Proper signal handling.
      v.   Writes pidfile in /var/run/watchdogd.pid
   b. watchdogd(8) manpage
   c. /etc/rc check for watchdogd_enabled="YES"
   d. /etc/rc.d/watchdogd rcNG script
   e. Addition of 'watchdogd_enabled="NO"' to /etc/defaults/rc.conf


I have a short TODO list as well:
 * Deal with when ticks overflows (this will be pretty easy)
 * Do multiple instances of interrupt and backtrace outputs a few seconds
   apart. (This will be pretty easy)
 * Flesh out the watchdogd daemon to do more checks once I figure out what
   checks people advise it do. And by checks, I mean "test a, b, and c must
   not fail or I won't reset the watchdog."

What I have so far is available for viewing at
http://www.zombie.org/watchdog.diff



I believe this functionality will be invaluable for debugging 5.0.  I'd
like to have this included as soon as the todo list is covered
and it gets a proper review.  Comments?

Cheers,
Jeff


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021115191632.U22491-100000>