From owner-freebsd-fs  Fri Jan 17 14:24:16 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D6F0237B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 17 Jan 2003 14:24:13 -0800 (PST)
Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0545143F5F
	for <freebsd-fs@FreeBSD.ORG>; Fri, 17 Jan 2003 14:24:13 -0800 (PST)
	(envelope-from dschultz@uclink.berkeley.edu)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0HMOBbZ005662;
	Fri, 17 Jan 2003 14:24:11 -0800 (PST)
	(envelope-from dschultz@uclink.berkeley.edu)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0HMOAJQ005661;
	Fri, 17 Jan 2003 14:24:10 -0800 (PST)
	(envelope-from dschultz@uclink.berkeley.edu)
Date: Fri, 17 Jan 2003 14:24:10 -0800
From: David Schultz <dschultz@uclink.berkeley.edu>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Jason Schoonover <jason_jks@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support)
Message-ID: <20030117222410.GA5449@HAL9000.homeunix.com>
Mail-Followup-To: Terry Lambert <tlambert2@mindspring.com>,
	Jason Schoonover <jason_jks@yahoo.com>, freebsd-fs@FreeBSD.ORG
References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Thus spake Terry Lambert <tlambert2@mindspring.com>:
> > FreeBSD uses softupdates, which achieves similar efficiency and
> > reliability goals to journaling. With softupdates, you don't need
> > to fsck at all at boot time following a power failure or crash
> > because the worst case scenario (hardware failure aside) is that
> > some disk space that is really free is marked as allocated.
> 
> No, the worst case following a power failure is a screwed disk
> track.

Yes, I'm familiar with this failure mode; it has been discussed on
the lists before.  I was grouping it in the ``hardware failure''
category I mentioned so I could make my post concise, so as to not
fall asleep in the middle of it.  ;-)

> Soft updates optimizes for sector writing, not track writing,
> while journalling can journal on the basis of track-sized
> extents.
> 
> If it is written correctly (there are a number of technical
> challenges to writing this correctly, and SGI, IBM, and Linux
> haven't done it, but it's theoretically possible, though very
> hard on IDE -- much easier on SCSI because the physical geometry
> can be accessed via mode page 2).

Even if you know the size of each physical track and manage to
write a journalling filesystem that takes that into account, I
would think that you'd wind up wasting memory or paying for
read-modify-write cycles to commit entire tracks.  Nevertheless, I
suppose it could be done.  The LFS was very nearly a solution to
this problem, but it didn't take the disk geometry into account.

If you are going to assume that the hardware is going to do
something stupid (a good assumption), then the problem is actually
much worse than you imply.  RAID controllers and disk firmware,
like operating systems, have race conditions and other bugs.
Neither softupdates nor journalling alone will save you from a
misdirected or phantom write, a misdirected read, or an interface
error.  Hardware checksums will not fix the problem either.  In
the cases of misdirected reads and writes, the checksums match.
For an interface error, there isn't even a checksum to verify,
because it's already been verified and discarded by the disk.  You
need far more than just DC holdup if you want to detect and
possibly correct these problems.

In light of that, I do group softupdates and journalling in the
same category, since neither provides filesystem integrity in the
face of hardware errors.  I agree with you that journalling could
solve one particular problem associated with full track writes,
but as you mentioned, nobody actually does journalling that way.
But the idea that you can take a UFS-like filesystem and fix all
of its metadata integrity problems by adding journalling to it is
nonsense.

There is some ongoing work on a commercial filesystem that can
verify metadata integrity and usually recover from errors on the
fly.  Think of an LFS structured around a Merkle tree.  The
ultimate goal is to be able to swap to it for a while, and still
be able to mount it afterwards without running a filesystem
checker.  People who really need that kind of reliability should
be using that kind of filesystem, and paying the associated
performance penalty.  The rest of us can use softupdates or
journalling and have protection against what is by far the most
common case: filesystem corruption as a result of unordered
metadata updates.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message