From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 15 09:20:10 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0F54937B405
	for <freebsd-hackers@freebsd.org>;
	Tue, 15 Jul 2003 09:20:10 -0700 (PDT)
Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11])
	by mx1.FreeBSD.org (Postfix) with SMTP id 0023343FBF
	for <freebsd-hackers@freebsd.org>;
	Tue, 15 Jul 2003 09:20:07 -0700 (PDT)
	(envelope-from dwmalone@maths.tcd.ie)
Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP
          id <aa96072@salmon>; 15 Jul 2003 17:20:06 +0100 (BST)
Date: Tue, 15 Jul 2003 17:20:06 +0100
From: David Malone <dwmalone@maths.tcd.ie>
To: Sumit Shah <shah@ucla.edu>
Message-ID: <20030715162006.GA47687@walton.maths.tcd.ie>
References: <EEA0280E-B6C7-11D7-9819-000393DB86CA@ucla.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <EEA0280E-B6C7-11D7-9819-000393DB86CA@ucla.edu>
User-Agent: Mutt/1.5.3i
Sender: dwmalone@maths.tcd.ie
cc: freebsd-hackers@freebsd.org
Subject: Re: RAID and NFS exports (Possible Data Corruption)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jul 2003 16:20:10 -0000

On Tue, Jul 15, 2003 at 06:26:24AM -0700, Sumit Shah wrote:
> Here is a message I sent to freebsd-questions and I was hoping I could 
> get some help debugging this.

It seems very unlikely that restarting mountd cound cause an error
like:

>ad4: hard error reading fsbn  242727552

The error means that that the disk said that there was an error
trying to read this block. You say that when you rebooted that the
controler said a disk had gone bad, so this would sort of confirm
this. (I could believe that restarting mountd might upset raid stuff
if there were a kernel bug, but it seems very unlikely it could
cause a disk to go bad.)

My best guess would be that you have a bad batch of disks that
happen to have failed in similar ways. It is possible that restarting
mountd uncovered the errors, 'cos I think mountd internally does
a remount of the filesystem in question and that might cause a chunk
of stuff to be flushed out on to the disk, highlighting an error.

(I had a bunch of the IBM "deathstar" disks fail on me within the
space of a week or so, after they'd been in use for about six
months.)

	David.