Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Sep 1999 23:13:02 -0700 (PDT)
From:      "Rodney W. Grimes" <freebsd@gndrsh.dnsmgr.net>
To:        john@nlc.net.au (John Saunders)
Cc:        freebsd-current@FreeBSD.ORG (FreeBSD current)
Subject:   Re: Automating filesystem check at boot time
Message-ID:  <199909240613.XAA01586@gndrsh.dnsmgr.net>
In-Reply-To: <011701bf064e$d0f7f4f0$6cb611cb@scitec.com.au> from John Saunders at "Sep 24, 1999 03:36:56 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> I administer a number of remote FreeBSD boxes and starting with 3.x
> they have been unreliable at rebooting. We all know FreeBSD wants to
> keep running forever, however it seems to be at the expense of
> reboot stability. I have found the following problems occuring.
> 
> 1)  After a power failure the filesystem is inconsistent such that
>     a manual fsck is required. Actually this can also occur following
>     a crash or failed shutdown. However I must admit that FreeBSD does
>     this less than Linux, but it still does it.

Unattended remotly administered systems shall have an ups, power failure
is not an acceptable operation in 99% of these types of systems.  There
is no good reason to have to hack thems up to deal with this situation.
Get a good ups, and install one of the ups monitor daemons from ports
to properly shut down before total battery failure.

> 2)  After running "shutdown -r now" FreeBSD will kill off all processes
>     but complain that is unable to kill everything. It then says Syncing
>     disks...done. Then hangs until the reset button is pressed. I think
>     that amd is causing this. The time this happened was following a
>     reboot to clear an amd problem when the NFS server was isolated from
>     from the network for some time.

You need to find and fix what ever it is that is not dieing when being
told to die.  Your work around is a bandaid that only hides the real
problem, which is probably a bug some place in something.  amd and NFS
are good first conidates.  Just what process does it complain that it
is unable to kill, or does it just say could not kill?

> My previous hacks at Linux has led me to the following patch to /etc/rc
> which I have been using for a while on FreeBSD to solve point 1. It has
> saved me a lot of driving on 2 occasions. The program "waitkey" is one
> I wrote that sleeps for the specicifed and returns TRUE (0), unless a
> key is pressed in which case it returns the ASCII code for the key.

Not a bad hack, but I will never ever run a fsck -y without at least
a log file some place, and then only as a last resort after imaging
the disk and doing everything else I can to try and recoverer it if
it has anything important at all on it.

> 
> +++ rc  Wed Aug 18 13:59:59 1999
> @@ -69,6 +69,12 @@
>                 ;;
>         8)
>                 echo "Automatic file system check failed... help!"
> +               if waitkey 30; then
> +                       exit 1
> +               fi
> +               fsck -y

fsck -y >tosomeplace writable.  Hard to figure out at this stage, but
if you run a seperate /tmp like we do you can always change this
to:
	newfs /dev/rawtmpdev
	mount -u /tmp
	fsck -u >&/tmp/fsck-y.OUT

So you have some clue as to what got destroyed during the fsck.

> +               reboot
> +               echo "reboot failed... help!"
Have you actually ever seen this echo execute??

>                 exit 1
>                 ;;
>         12)
> 
> Anyway I am proposing a method where FreeBSD can be configured though an
> rc.conf knob to be more friendly in an unattended situation. I propose
> as a first step that a knob called "unattended_operation" be added with
> a default value of "NO". Enabling this knob can be used to allow code
> like the above to be executed. It can also be used to force the sysctl
> variable "debug.debugger_on_panic" to 0 in the rc file.

You might want to get input from Julian and friends at Whistle, they
are experts at unattended operations...

> 
> I can also contribute the waitkey.c program. It may even be useful for
> other stuff with some changes to the command syntax.

This can actually be done from /bin/sh using a background processes,
just run a read from the console in background, sleep for X, check for
child status, if child alive no keypress occured, kill child.

> 
> Does anybody have any strong opinions on this, either way? I have this
> running on my machine at present so I'm not too fussed either way, just
> thought it might be useful for other people as well. I can supply code
> and patches, but I would like somebody with commit privs to look over
> the code, make suggestions and eventually commit the work.

I'll strongly object to any automated running of a fsck -y, it is far
to dangerious for way to many folks.

-- 
Rod Grimes - KD7CAX - (RWG25)                    rgrimes@gndrsh.dnsmgr.net


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199909240613.XAA01586>