Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Nov 2002 15:26:04 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        Robert Watson <rwatson@FreeBSD.ORG>, Mikhail Teterin <mi+mx@aldan.algebra.com>, current@FreeBSD.ORG
Subject:   Re: -current unusable after a crash
Message-ID:  <3DE2B18C.AA33F92F@mindspring.com>
References:  <200211250959.39594.mi%2Bmx@aldan.algebra.com> <Pine.NEB.3.96L.1021125102358.33619A-100000@fledge.watson.org> <20021125172445.GA8953@rot13.obsecurity.org> <3DE29DE6.CDD96F3F@mindspring.com> <20021125221748.GA11747@rot13.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Kris Kennaway wrote:
> On Mon, Nov 25, 2002 at 02:02:14PM -0800, Terry Lambert wrote:
> > I don't think this is really possible.
> 
> Yeah :(
> 
> > If you made system dumps mandatory (or marked swap with a non-dump
> > header in case of panic), this still would not handle the "silent
> > reboot", "double panic", or "single panic with disk I/O trashed"
> > cases.  8-(.
> 
> And the panics that affect the disk/filesystem are likely to not give
> a crashdump, but at the same time are likely to cause FS problems for
> bgfsck :-(

Actually, the worst problems come when the corruption does not
result in a crash subsequently.

If you just crashed again, you could simply set in the superblock
a flag that said "background fsck in progress", and if that flag
was set at boot time, then do a full fsck (knowing you died during
a background fsck).

If you don't get a second crash, and you reboot, you're screwed.

You could add another utility to say "force full fsck" -- basically,
to set the flag manually.  This is a pain because you have to do it
through an fcntl() or ioctl(), since there are no block devices to
use to do the work, and you can't open a mounted device to write it,
even if you know what you are doing, the OS enforces like it's
smarter than you.

We ran into exactly this same problem in the InterJet, when we first
paid Kirk to have soft updates ported to FreeBSD (I actually did the
preliminary "make it compile" work, and Julian did most of the
debugging; I helped some after that, but my boss didn't like me
doing it).  The point was to get rid of the need for a UPS in the
InterJet.

A log structured FS doesn't actually have this problem, but is a
real pain because of the need for a "cleaner" to run constantly,
to garbage collect, which makes thing that used to be deterministic
time take variable time.  Not very good for multimedia or streaming
content serving.

The InterJet handled this by having a DC holdup time following AC
failure notification, which was enough to throw a stick into the
spokes, to prevent the wheels from turning, and the bicycle falling
over the cliff.

Another way to handle it would be CMOS, with a BIOS initialization
(e.g. set bit 1 of the "crash state") that didn't effect the bits
that indicated the failure mode.

Unfortunately, the computer manufacturers have not really agreed
on a standard for this sort of thing, nor do they think anyone in
OS space or userland should be able to own a section of CMOS
memory (no OS allocation policy, tagging, etc.).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DE2B18C.AA33F92F>