From owner-freebsd-stable@FreeBSD.ORG  Tue Feb 17 16:29:36 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3163A16A4D0
	for <freebsd-stable@FreeBSD.org>;
	Tue, 17 Feb 2004 16:29:36 -0800 (PST)
Received: from mail.dt.e-technik.uni-dortmund.de
	(krusty.dt.E-Technik.Uni-Dortmund.DE [129.217.163.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F3E1F43D2D
	for <freebsd-stable@FreeBSD.org>;
	Tue, 17 Feb 2004 16:29:35 -0800 (PST)
	(envelope-from matthias.andree@gmx.de)
Received: from m2a2.dyndns.org (krusty.dt.e-technik.uni-dortmund.de
	[129.217.163.1])EA81E1E2E9	for <freebsd-stable@FreeBSD.org>;
	Wed, 18 Feb 2004 01:29:34 +0100 (CET)
Received: by merlin.emma.line.org (Postfix, from userid 500)
	id 7763993E; Wed, 18 Feb 2004 01:29:33 +0100 (CET)
Date: Wed, 18 Feb 2004 01:29:33 +0100
From: Matthias Andree <matthias.andree@gmx.de>
To: freebsd-stable@FreeBSD.org
Message-ID: <20040218002933.GB21639@merlin.emma.line.org>
Mail-Followup-To: freebsd-stable@FreeBSD.org
References: <m38yj15m59.fsf@merlin.emma.line.org>
	<200402172335.i1HNZB7E051322@gw.catspoiler.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200402172335.i1HNZB7E051322@gw.catspoiler.org>
User-Agent: Mutt/1.5.5.1i
Subject: Re: ahc and massive ffs+softupdates corruption
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Feb 2004 00:29:36 -0000

On Tue, 17 Feb 2004, Don Lewis wrote:

> > This machine had a SCSI timeout problem on Friday Feb 6th and went down
> > hard, suffering massive file system corruption on /var. At that time,
> > the machine was running portupgrade -a. /var is using softupdates and
> > uses default mount options. As said before, the drive's FWC enable was
> > set to 0 in both the current and saved editions of mode page 8, and I
> > wonder how such massive corruption can happen. I was under the
> > impression that softupdates prevented any on-disk corruptions that
> > require user intervention at fsck time. Given that the write cache was
> > off, I am wondering if there are any ffs+softupdates or tagged command
> > queueing bugs left (that might reorder writes - ordered tag forgotten or
> > something).
> 
> The UNKNOWN FILE TYPE complains are a pretty good clue that a block
> containing inodes got overwritten by garbage.  I've seen this sort of
> thing happen if power to a drive fails.  It could also be caused by a
> driver or firmware bug that causes data to get written to the wrong
> place, or a cabling or termination problem that causes the drive to see
> the wrong command.

Ah, that makes some sense.

It's unlikely to be a termination/cabling/power problem, the machine is
otherwise rock solid and has been stable after the incident, too. If
there had been a serious power outage, the other machine wouldn't have
been able to log properly or would have logged a reboot.

I won't preclude firmware/hardware bugs, given that the drive just
disappears from the bus when it is inquired too early after power
up/reset - a reset-to-inquiry delay of 10 s in Tekram controllers fixed
this. Adaptec's 2940 UW Pro does something different and works in default
configuration.

Final question for now: Does one disk block contain multiple inodes? How
many maximum?

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95