From owner-freebsd-stable@FreeBSD.ORG Tue Feb 17 16:29:36 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3163A16A4D0 for ; Tue, 17 Feb 2004 16:29:36 -0800 (PST) Received: from mail.dt.e-technik.uni-dortmund.de (krusty.dt.E-Technik.Uni-Dortmund.DE [129.217.163.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id F3E1F43D2D for ; Tue, 17 Feb 2004 16:29:35 -0800 (PST) (envelope-from matthias.andree@gmx.de) Received: from m2a2.dyndns.org (krusty.dt.e-technik.uni-dortmund.de [129.217.163.1])EA81E1E2E9 for ; Wed, 18 Feb 2004 01:29:34 +0100 (CET) Received: by merlin.emma.line.org (Postfix, from userid 500) id 7763993E; Wed, 18 Feb 2004 01:29:33 +0100 (CET) Date: Wed, 18 Feb 2004 01:29:33 +0100 From: Matthias Andree To: freebsd-stable@FreeBSD.org Message-ID: <20040218002933.GB21639@merlin.emma.line.org> Mail-Followup-To: freebsd-stable@FreeBSD.org References: <200402172335.i1HNZB7E051322@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200402172335.i1HNZB7E051322@gw.catspoiler.org> User-Agent: Mutt/1.5.5.1i Subject: Re: ahc and massive ffs+softupdates corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Feb 2004 00:29:36 -0000 On Tue, 17 Feb 2004, Don Lewis wrote: > > This machine had a SCSI timeout problem on Friday Feb 6th and went down > > hard, suffering massive file system corruption on /var. At that time, > > the machine was running portupgrade -a. /var is using softupdates and > > uses default mount options. As said before, the drive's FWC enable was > > set to 0 in both the current and saved editions of mode page 8, and I > > wonder how such massive corruption can happen. I was under the > > impression that softupdates prevented any on-disk corruptions that > > require user intervention at fsck time. Given that the write cache was > > off, I am wondering if there are any ffs+softupdates or tagged command > > queueing bugs left (that might reorder writes - ordered tag forgotten or > > something). > > The UNKNOWN FILE TYPE complains are a pretty good clue that a block > containing inodes got overwritten by garbage. I've seen this sort of > thing happen if power to a drive fails. It could also be caused by a > driver or firmware bug that causes data to get written to the wrong > place, or a cabling or termination problem that causes the drive to see > the wrong command. Ah, that makes some sense. It's unlikely to be a termination/cabling/power problem, the machine is otherwise rock solid and has been stable after the incident, too. If there had been a serious power outage, the other machine wouldn't have been able to log properly or would have logged a reboot. I won't preclude firmware/hardware bugs, given that the drive just disappears from the bus when it is inquired too early after power up/reset - a reset-to-inquiry delay of 10 s in Tekram controllers fixed this. Adaptec's 2940 UW Pro does something different and works in default configuration. Final question for now: Does one disk block contain multiple inodes? How many maximum? -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95