From owner-freebsd-embedded@freebsd.org Mon Sep 30 22:53:31 2019 Return-Path: Delivered-To: freebsd-embedded@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5F95B134457 for ; Mon, 30 Sep 2019 22:53:31 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gate2.funkthat.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46hyP64Y3Hz4YXX; Mon, 30 Sep 2019 22:53:30 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.15.2/8.15.2) with ESMTPS id x8UMr2dc091017 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 30 Sep 2019 15:53:02 -0700 (PDT) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.15.2/8.15.2/Submit) id x8UMr1gb091016; Mon, 30 Sep 2019 15:53:01 -0700 (PDT) (envelope-from jmg) Date: Mon, 30 Sep 2019 15:53:01 -0700 From: John-Mark Gurney To: Ian Lepore Cc: Warner Losh , mike tancsa , freebsd-embedded Subject: Re: watchdogd stat location Message-ID: <20190930225301.GA96402@funkthat.com> Mail-Followup-To: Ian Lepore , Warner Losh , mike tancsa , freebsd-embedded References: <5eba25eb-9ba4-0c93-27c8-e834491298ad@sentex.net> <83831ae6-9275-4f0c-a23d-c9cca3dc28f4@sentex.net> <817c7ed712d6b7da3015b7312be485a9044b14e1.camel@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <817c7ed712d6b7da3015b7312be485a9044b14e1.camel@freebsd.org> X-Operating-System: FreeBSD 11.0-RELEASE-p7 amd64 X-PGP-Fingerprint: D87A 235F FB71 1F3F 55B7 ED9B D5FF 5A51 C0AC 3D65 X-Files: The truth is out there X-URL: https://www.funkthat.com/ X-Resume: https://www.funkthat.com/~jmg/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.6.1 (2016-04-27) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (gold.funkthat.com [127.0.0.1]); Mon, 30 Sep 2019 15:53:02 -0700 (PDT) X-Rspamd-Queue-Id: 46hyP64Y3Hz4YXX X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of jmg@gold.funkthat.com has no SPF policy when checking 208.87.223.18) smtp.mailfrom=jmg@gold.funkthat.com X-Spamd-Result: default: False [-1.39 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.97)[-0.967,0]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; IP_SCORE(-0.63)[ip: (-1.64), ipnet: 208.87.216.0/21(-0.82), asn: 32354(-0.66), country: US(-0.05)]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[funkthat.com]; AUTH_NA(1.00)[]; NEURAL_HAM_LONG(-0.99)[-0.991,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; R_SPF_NA(0.00)[]; FORGED_SENDER(0.30)[jmg@funkthat.com,jmg@gold.funkthat.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:32354, ipnet:208.87.216.0/21, country:US]; FROM_NEQ_ENVFROM(0.00)[jmg@funkthat.com,jmg@gold.funkthat.com]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-embedded@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Dedicated and Embedded Systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Sep 2019 22:53:31 -0000 Ian Lepore wrote this message on Sat, Sep 28, 2019 at 13:30 -0600: > On Fri, 2019-09-27 at 15:31 -0600, Warner Losh wrote: > > On Fri, Sep 27, 2019 at 2:30 PM mike tancsa wrote: > > > > > On 9/27/2019 3:53 PM, Warner Losh wrote: > > > > > > > > > > > > > I am all for that too. Just something other than /etc or /var > > > > which are > > > > often mounted on ramdisk. > > > > > > > > > > > > I think that / is too special to cause disk IO to ever happen. > > > > Other > > > > dirs will sometimes not be in the cache.... The notion here, > > > > perhaps > > > > bogus, is that we want to check the root FS is sane. The stat(2) > > > > is a > > > > cheap way to do this that will eventually fail if / goes wonky > > > > enough. > > > > It's weak. > > > > > > > > > > > > > > Would something like this buy any extra sanity ? or not worth it. I > > > guess fancier checks belong in a passed program > > > > > > > > > # diff -u watchdogd.c.orig watchdogd.c > > > --- watchdogd.c.orig 2019-09-27 16:27:14.456973000 -0400 > > > +++ watchdogd.c 2019-09-27 16:27:18.904885000 -0400 > > > @@ -364,9 +364,23 @@ > > > > > > if (test_cmd != NULL) > > > failed = system(test_cmd); > > > - else > > > - failed = stat("/etc", &sb); > > > - > > > + else { > > > + > > > + srand(time(NULL)); > > > + switch(rand() % 4) { > > > + case 0: > > > + failed = stat("/", &sb); > > > + break; > > > + case 1: > > > + failed = stat("/bin", &sb); > > > + break; > > > + case 2: > > > + failed = stat("/sbin", > > > &sb); > > > + break; > > > + default: > > > + failed = stat("/usr", &sb); > > > + } > > > + } > > > error = watchdog_getuptime(&ts_end); > > > if (error) { > > > end_program = 1; > > > > > > > I don't think the rand helps at all. I think you'd really rather do > > things > > sequentially. And this introduces more assumptions about the > > underlying > > filesystem(s). > > > > Warner > > > > If we want to be sure to force physical IO, how about dd if=/ > of=/dev/null count=1 ? > > But I question the premise of forcing physical IO as being somehow a > better indicator of a non-hung system. I think it's just a better > indicator of the sdcard problem that Mike is experiencing. For anyone > else, forcing periodic physical IO is going to do annoying things like > spin up idle drives. even better is to pull the device from df (zfs is a bit more complex, but code in rc.d/growfs exists for it) and do IO directly to the device.. Then you are bypassing the disk cache entirely. But I agree that spinning up idle drives isn't a good option... Looks like maybe the trivial file system check should be documented better, in that it stat's /etc, and mention that anyone who wants to change which directory is stat'd, just use -e 'stat /dir' instead? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."