From owner-freebsd-fs Fri Jan 17 14:24:16 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D6F0237B401 for ; Fri, 17 Jan 2003 14:24:13 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0545143F5F for ; Fri, 17 Jan 2003 14:24:13 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0HMOBbZ005662; Fri, 17 Jan 2003 14:24:11 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0HMOAJQ005661; Fri, 17 Jan 2003 14:24:10 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Date: Fri, 17 Jan 2003 14:24:10 -0800 From: David Schultz To: Terry Lambert Cc: Jason Schoonover , freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) Message-ID: <20030117222410.GA5449@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , Jason Schoonover , freebsd-fs@FreeBSD.ORG References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Terry Lambert : > > FreeBSD uses softupdates, which achieves similar efficiency and > > reliability goals to journaling. With softupdates, you don't need > > to fsck at all at boot time following a power failure or crash > > because the worst case scenario (hardware failure aside) is that > > some disk space that is really free is marked as allocated. > > No, the worst case following a power failure is a screwed disk > track. Yes, I'm familiar with this failure mode; it has been discussed on the lists before. I was grouping it in the ``hardware failure'' category I mentioned so I could make my post concise, so as to not fall asleep in the middle of it. ;-) > Soft updates optimizes for sector writing, not track writing, > while journalling can journal on the basis of track-sized > extents. > > If it is written correctly (there are a number of technical > challenges to writing this correctly, and SGI, IBM, and Linux > haven't done it, but it's theoretically possible, though very > hard on IDE -- much easier on SCSI because the physical geometry > can be accessed via mode page 2). Even if you know the size of each physical track and manage to write a journalling filesystem that takes that into account, I would think that you'd wind up wasting memory or paying for read-modify-write cycles to commit entire tracks. Nevertheless, I suppose it could be done. The LFS was very nearly a solution to this problem, but it didn't take the disk geometry into account. If you are going to assume that the hardware is going to do something stupid (a good assumption), then the problem is actually much worse than you imply. RAID controllers and disk firmware, like operating systems, have race conditions and other bugs. Neither softupdates nor journalling alone will save you from a misdirected or phantom write, a misdirected read, or an interface error. Hardware checksums will not fix the problem either. In the cases of misdirected reads and writes, the checksums match. For an interface error, there isn't even a checksum to verify, because it's already been verified and discarded by the disk. You need far more than just DC holdup if you want to detect and possibly correct these problems. In light of that, I do group softupdates and journalling in the same category, since neither provides filesystem integrity in the face of hardware errors. I agree with you that journalling could solve one particular problem associated with full track writes, but as you mentioned, nobody actually does journalling that way. But the idea that you can take a UFS-like filesystem and fix all of its metadata integrity problems by adding journalling to it is nonsense. There is some ongoing work on a commercial filesystem that can verify metadata integrity and usually recover from errors on the fly. Think of an LFS structured around a Merkle tree. The ultimate goal is to be able to swap to it for a while, and still be able to mount it afterwards without running a filesystem checker. People who really need that kind of reliability should be using that kind of filesystem, and paying the associated performance penalty. The rest of us can use softupdates or journalling and have protection against what is by far the most common case: filesystem corruption as a result of unordered metadata updates. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message