From owner-freebsd-current@FreeBSD.ORG Tue Dec 17 22:15:03 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9CE1FA4; Tue, 17 Dec 2013 22:15:03 +0000 (UTC) Received: from mail-we0-x234.google.com (mail-we0-x234.google.com [IPv6:2a00:1450:400c:c03::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1F33F1254; Tue, 17 Dec 2013 22:15:02 +0000 (UTC) Received: by mail-we0-f180.google.com with SMTP id t61so6606689wes.25 for ; Tue, 17 Dec 2013 14:15:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Pshw/617wYCHP0QTK3B6MHA4Ync9lgoaeYmnq+wEuXY=; b=LAQPY9pdYSuwJ2uYxQ3vg3tKzegZzCCL9Q5rAIObaYQVBRYK6ngwiAm07fgj1JOQXk rW4JdTj2wzKue1uH2eiBVjUZxssAw1K+edgMahbr5svLjh9zTPQMgKx9CcgPrwZhqGxH ls6x0tSn/oZBBwgKO6gTpihf+FRv1XQwSb1wzElPnS4EmQzrHX31ZY24mgZtCGltIoAY 5gLyEsnNhYzF6cAiI9MBSeD6BXqReDgGAPprF36Sl5Sb04a6fWntMgV+w1+sn4CjMGaQ ylkgg/cJ0UGkxl/67W0k9kkEE4BCWH/GHAiTLqb8CKa5qfFirEM2rIbgZbG07KajmGGh Re0w== MIME-Version: 1.0 X-Received: by 10.180.7.136 with SMTP id j8mr5251985wia.17.1387318501258; Tue, 17 Dec 2013 14:15:01 -0800 (PST) Received: by 10.227.63.136 with HTTP; Tue, 17 Dec 2013 14:15:01 -0800 (PST) In-Reply-To: <52B0C988.5010507@FreeBSD.org> References: <52B0C988.5010507@FreeBSD.org> Date: Tue, 17 Dec 2013 14:15:01 -0800 Message-ID: Subject: Re: [rfc] [patch] do not stop watchdog on shutdown From: Maksim Yevmenkin To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: "current@freebsd.org" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Dec 2013 22:15:03 -0000 On Tue, Dec 17, 2013 at 2:00 PM, Andriy Gapon wrote: > on 17/12/2013 20:53 Maksim Yevmenkin said the following: >> hello, >> >> would anyone object to this patch? >> >> max >> >> Index: src/etc/rc.d/watchdogd >> =================================================================== >> --- src/etc/rc.d/watchdogd (revision 2999) >> +++ src/etc/rc.d/watchdogd (working copy) >> @@ -39,4 +39,7 @@ >> pidfile="/var/run/${name}.pid" >> >> load_rc_config $name >> + >> +sig_stop="${watchdogd_sig_stop:-TERM}" >> + >> run_rc_command "$1" > > I wonder if anyone could object to this rather generic (and NOP by default) change. > I see your intent, but a few words about it would not hurt :-) well, when watchdogd is asked to exit nicely (via SIGTERM) it will stop timer. since watchdogd rc.d script is marked as 'shutdown' it will exit (on shutdown) and stop timer. if system happens to hung after this, manual reset is required. when one operates in "lights-out" type of environments and without readily available "remote hands" it could create a problem. default behavior is preserved, i.e. watchdogd will still be killed via SIGTERM and timer will be stopped. in order to activate new feature, one needs to put watchdogd_sig_stop="KILL" into /etc/rc.conf and also make sure watchdogd timeout is set to long enough value make sure system comes back online before timeout fires. > BTW, for a while now we have some support for interacting with the watchdog(9) > from within the kernel. I have the following local patch / hack that makes use > of that support: > > commit b64c5e855420f2d905a04f69fad5de116e8ffae5 > Author: Andriy Gapon > Date: Fri Nov 25 10:00:59 2011 +0200 > > [test] arm the watchdog before going into the final shutdown/reboot step > > ... to preclude hanging on that step. > Note: halt assumes the limbo, so no watchdog for that case. > > diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c > index eaa78b8e..88afaa9 100644 > --- a/sys/kern/kern_shutdown.c > +++ b/sys/kern/kern_shutdown.c > @@ -444,6 +444,11 @@ kern_reboot(int howto) > if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping) > doadump(TRUE); > > + if ((howto & RB_HALT) != 0) > + wdog_kern_pat(0); > + else > + wdog_kern_pat(WD_TO_32SEC + 1); > + > /* Now that we're going to really halt the system... */ > EVENTHANDLER_INVOKE(shutdown_final, howto); > > > Admittedly, there is a gap between userland watchdog being stopped and kernel > watchdog taking over. I wish that we had 'proper' integration between them, > with proper hand-off, etc. fixed timeout of 32 sec (if i'm understanding this correctly) might not be enough for all usage cases. its definitely not enough in for our usage case. at the very least timeout value should be configurable to be useful in our case. thanks, max