Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Feb 2004 01:29:33 +0100
From:      Matthias Andree <matthias.andree@gmx.de>
To:        freebsd-stable@FreeBSD.org
Subject:   Re: ahc and massive ffs+softupdates corruption
Message-ID:  <20040218002933.GB21639@merlin.emma.line.org>
In-Reply-To: <200402172335.i1HNZB7E051322@gw.catspoiler.org>
References:  <m38yj15m59.fsf@merlin.emma.line.org> <200402172335.i1HNZB7E051322@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 Feb 2004, Don Lewis wrote:

> > This machine had a SCSI timeout problem on Friday Feb 6th and went down
> > hard, suffering massive file system corruption on /var. At that time,
> > the machine was running portupgrade -a. /var is using softupdates and
> > uses default mount options. As said before, the drive's FWC enable was
> > set to 0 in both the current and saved editions of mode page 8, and I
> > wonder how such massive corruption can happen. I was under the
> > impression that softupdates prevented any on-disk corruptions that
> > require user intervention at fsck time. Given that the write cache was
> > off, I am wondering if there are any ffs+softupdates or tagged command
> > queueing bugs left (that might reorder writes - ordered tag forgotten or
> > something).
> 
> The UNKNOWN FILE TYPE complains are a pretty good clue that a block
> containing inodes got overwritten by garbage.  I've seen this sort of
> thing happen if power to a drive fails.  It could also be caused by a
> driver or firmware bug that causes data to get written to the wrong
> place, or a cabling or termination problem that causes the drive to see
> the wrong command.

Ah, that makes some sense.

It's unlikely to be a termination/cabling/power problem, the machine is
otherwise rock solid and has been stable after the incident, too. If
there had been a serious power outage, the other machine wouldn't have
been able to log properly or would have logged a reboot.

I won't preclude firmware/hardware bugs, given that the drive just
disappears from the bus when it is inquired too early after power
up/reset - a reset-to-inquiry delay of 10 s in Tekram controllers fixed
this. Adaptec's 2940 UW Pro does something different and works in default
configuration.

Final question for now: Does one disk block contain multiple inodes? How
many maximum?

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040218002933.GB21639>