Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Sep 2019 15:53:01 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Ian Lepore <ian@freebsd.org>
Cc:        Warner Losh <imp@bsdimp.com>, mike tancsa <mike@sentex.net>, freebsd-embedded <freebsd-embedded@freebsd.org>
Subject:   Re: watchdogd stat location
Message-ID:  <20190930225301.GA96402@funkthat.com>
In-Reply-To: <817c7ed712d6b7da3015b7312be485a9044b14e1.camel@freebsd.org>
References:  <5eba25eb-9ba4-0c93-27c8-e834491298ad@sentex.net> <CANCZdfp6bym5b6eFXFH0MxjYsAX%2B1Bi9fGXgp7sFM206zmsveQ@mail.gmail.com> <CAJ1Oi8FsG=nEBXdd0CS3U2zZSgh=SMcBO0ieY-KT5b1iDVFmJg@mail.gmail.com> <83831ae6-9275-4f0c-a23d-c9cca3dc28f4@sentex.net> <CANCZdfrRh7Ssf9vSSJ4Hopec1q7abLi9AdUqoPqZm4hPok6QUQ@mail.gmail.com> <fcdd9659-d7e4-c554-b501-6b8cd178f6d7@sentex.net> <CANCZdfpq424LcV04dBJfoid_KSSdYWGfq2StDCToxDzZXnAvfg@mail.gmail.com> <817c7ed712d6b7da3015b7312be485a9044b14e1.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Ian Lepore wrote this message on Sat, Sep 28, 2019 at 13:30 -0600:
> On Fri, 2019-09-27 at 15:31 -0600, Warner Losh wrote:
> > On Fri, Sep 27, 2019 at 2:30 PM mike tancsa <mike@sentex.net> wrote:
> > 
> > > On 9/27/2019 3:53 PM, Warner Losh wrote:
> > > > > 
> > > > 
> > > >     I am all for that too. Just something other than /etc or /var
> > > >     which are
> > > >     often mounted on ramdisk.
> > > > 
> > > > 
> > > > I think that / is too special to cause disk IO to ever happen.
> > > > Other
> > > > dirs will sometimes not be in the cache.... The notion here,
> > > > perhaps
> > > > bogus, is that we want to check the root FS is sane. The stat(2)
> > > > is a
> > > > cheap way to do this that will eventually fail if / goes wonky
> > > > enough.
> > > > It's weak.
> > > > 
> > > > 
> > > 
> > > Would something like this buy any extra sanity ? or not worth it. I
> > > guess fancier checks belong in a passed program
> > > 
> > > 
> > > # diff -u watchdogd.c.orig watchdogd.c
> > > --- watchdogd.c.orig    2019-09-27 16:27:14.456973000 -0400
> > > +++ watchdogd.c 2019-09-27 16:27:18.904885000 -0400
> > > @@ -364,9 +364,23 @@
> > > 
> > >                 if (test_cmd != NULL)
> > >                         failed = system(test_cmd);
> > > -               else
> > > -                       failed = stat("/etc", &sb);
> > > -
> > > +               else {
> > > +
> > > +                       srand(time(NULL));
> > > +                       switch(rand() % 4) {
> > > +                               case 0:
> > > +                                       failed = stat("/", &sb);
> > > +                                       break;
> > > +                               case 1:
> > > +                                       failed = stat("/bin", &sb);
> > > +                                       break;
> > > +                               case 2:
> > > +                                       failed = stat("/sbin",
> > > &sb);
> > > +                                       break;
> > > +                               default:
> > > +                                       failed = stat("/usr", &sb);
> > > +                       }
> > > +               }
> > >                 error = watchdog_getuptime(&ts_end);
> > >                 if (error) {
> > >                         end_program = 1;
> > > 
> > 
> > I don't think the rand helps at all. I think you'd really rather do
> > things
> > sequentially. And this introduces more assumptions about the
> > underlying
> > filesystem(s).
> > 
> > Warner
> > 
> 
> If we want to be sure to force physical IO, how about dd if=/
> of=/dev/null count=1 ?
> 
> But I question the premise of forcing physical IO as being somehow a
> better indicator of a non-hung system.  I think it's just a better
> indicator of the sdcard problem that Mike is experiencing.  For anyone
> else, forcing periodic physical IO is going to do annoying things like
> spin up idle drives.

even better is to pull the device from df (zfs is a bit more complex,
but code in rc.d/growfs exists for it) and do IO directly to the device..
Then you are bypassing the disk cache entirely.

But I agree that spinning up idle drives isn't a good option...  Looks
like maybe the trivial file system check should be documented better,
in that it stat's /etc, and mention that anyone who wants to change
which directory is stat'd, just use -e 'stat /dir' instead?

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190930225301.GA96402>