Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Sep 1998 08:02:00 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        Don.Lewis@tsc.tdk.com (Don Lewis)
Cc:        current@FreeBSD.ORG
Subject:   Re: softupdates & fsck
Message-ID:  <199809190802.BAA19430@usr08.primenet.com>
In-Reply-To: <199809190520.WAA09949@salsa.gv.tsc.tdk.com> from "Don Lewis" at Sep 18, 98 10:20:40 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> While doing various evil things with a machine running current, I've managed
> to get it to panic a number of times.  That and getting some filesystem
> damage as a result don't bother me, but I am bothered by the type of
> filesystem damage that I'm seeing.  When fsck runs at boot time, it finds
> a number of orphaned directories, which it reconnects in lost+found.
> For some reason, they end up with their link count being too low.  If
> I try to "rm -r" them, they are emptied of their contents, except for
> "." and "..", at which point they are unremovable because their link
> count is 1 instead of 2.  I've also seen directories elsewhere in the
> tree end up with a link count that's too low.  If I unmount the filesystem
> and run fsck again, fsck notices the problem, reports "UNEXPECTED SOFTDEP
> INCONSISTENCY", and fixes the problem.  There may be files with the
> wrong link count as well.
> 
> My suspicion is that the first fsck run is getting the link counts wrong
> when it repairs the filesystem.  I've taked a look at the fsck code, but
> haven't gotten too far, mostly because the code is so well commented -- NOT!

The theory behind soft updates is that things will be atomically
committed in dependency order.

For our purposes here, the fact that no on disk structure about which
atomicity guarantees must be made spans a 512 byte boundary is of
significance.

Because of this, there is no modern hardware known that will not guarantee
to write either all/none of a 512b region (one atomic disk block).

As a result, this means that if soft updates is working correctly,
the *only* type of error that can occur, and need to be corrected
by fsck, is a cylinder group bitmap inconsistency, and, in fact,
a cylinder group bitmap inconsistency that results in a bit being
set (marked allocated) when, in fact it is not.

This means that if you could lock down access to a cylinder group
in the FS code for the dureation of a bitmap consistency check, you
could do the necessary repairs following a power failure *while the
FS was online*.

In other words, fsck is unnecessary, except to deal with the fact
that a cleaner daemon has not been written, and the possibility of
physical hardware failure.


That you are seeing these problems implies that the bwrite ordering
guarantees that the driver must provide (i.e., that the blocks will
be written in the order requested, and that the writes will not
return as completed until the data has been committed to the disk)
are not being honored.

>From recent postings, it seems that CAM is not honoring the ordering
guarantees that the previous driver code honored.


If your problem is occurring on a non-CAM system, you should contact
Julian Elisher with a detailed list of the errors you are getting; if
it is occurring with a CAM system, you need to contact Justin Gibbs
and the other CAM authors about *not* reordering blocking write
requests.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809190802.BAA19430>