From owner-freebsd-fs@FreeBSD.ORG  Fri Aug  7 12:44:43 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6569C1065680
	for <freebsd-fs@freebsd.org>; Fri,  7 Aug 2009 12:44:43 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 1EC858FC24
	for <freebsd-fs@freebsd.org>; Fri,  7 Aug 2009 12:44:43 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 9874946B03;
	Fri,  7 Aug 2009 08:44:42 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id DA9BE8A0AB;
	Fri,  7 Aug 2009 08:44:41 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Date: Fri, 7 Aug 2009 08:29:54 -0400
User-Agent: KMail/1.9.7
References: <8E9591D8BCB72D4C8DE0884D9A2932DC35BD34C3@ITS-HCWNEM03.ds.Vanderbilt.edu>
In-Reply-To: <8E9591D8BCB72D4C8DE0884D9A2932DC35BD34C3@ITS-HCWNEM03.ds.Vanderbilt.edu>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200908070829.54571.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 07 Aug 2009 08:44:41 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE
	autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: 
Subject: Re: UFS Filesystem issues, and the loss of my hair...
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Aug 2009 12:44:44 -0000

On Thursday 06 August 2009 9:51:04 am Hearn, Trevor wrote:
> First off, let me state that I love FreeBSD. I've used it for years, and 
have not had any major problems with it... Until now.
> 
> As you can tell, I work for a major university. I setup a large storage 
array to hold data for a project they have here. No great shakes, just some 
standard files and such. The fun started when I started loading users onto 
the system, and they started using it... Isn't that always the case? Now, I 
get ufs_dirbad errors, and the system hard locks. This isn't the worst thing 
that could happen, but when you're talking about file partitions the size 
that I am using, the fsck takes FOREVER. Somewhere on the order of 1.5 hours. 
During that time, I am bringing the individual shares/partitions online, but 
the users suffer. I've asked about this before, in a different forum, but got 
no usable information that I could see. So, here goes...
> 
> The system is as such. A dell 2950 1U server, with a Qlogic Fibre Channel 
card. It is connected to two Promise Array chassis, 610 series, each with 16 
drives. Each chassis is running RAID 6, which gives me about 12.73tb of 
storage per chassis. From there, the logical drives are sliced up into 
smaller partitions. At most, I have a 3.6tb partition. The smallest is a 
100gig partition.
> 
> Filesystem       Size    Used   Avail Capacity  Mounted on
> /dev/mfid0s1a    197G     10G    170G     6%    /
> devfs            1.0K    1.0K      0B   100%    /dev
> /dev/da0p1       1.8T    1.5T    130G    92%    /slice1
> /dev/da0p5       2.7T    1.8T    661G    74%    /slice2
> /dev/da0p9       250G     21G    209G     9%    /slice3
> /dev/da1p3       103G     12G     83G    12%    /slice4
> /dev/da1p4       205G     54G    135G    29%    /slice5
> /dev/da1p5       103G    7.3G     87G     8%    /slice6
> /dev/da1p6       103G     22G     72G    23%    /slice7
> etc...
> 
> I had to use GPT to setup the partitions, and they are using UFS2 for the 
filesystem. Now... If that's not fun enough... I have TWO of these creatures, 
which RSYNC every 4 hours. The secondary system is across campus, and sits 
idle 99% of the time. Every 4 hours, in a stepped schedule, the primary array 
syncs to the secondary array. If the primary goes down, I FSCK, and any files 
that are fried, I bring back across from the secondary and replace them. This 
has worked OK for a while, but now I am getting Kernel Panics on a regular 
basis. I've been told to migrate to a different filesystem, but my options 
are ZFS and using GJOURNAL with UFS, from what I can tell. I need something 
repeatable, simple, and I need something robust. I have NO idea why I keep 
getting errors like this, but I imagine it's a cascading effect of other 
hangs that have caused more corruption.
> 
> I'd buy a fella, or gal, a cup of coffee and a pop-tart if they could help a 
brother out. I have checked out this link:
> 
http://phaq.phunsites.net/2007/07/01/ufs_dirbad-panic-with-mangled-entries-in-ufs/
> and decided that I need to give this a shot after hours, but being the kinda 
guy I am, I need to make sure I am covering all of my bases. 

Are you seeing ufs_dirbad panics?  Specifically, can you capture the messages 
on the console when the machine panics?

-- 
John Baldwin