From owner-freebsd-fs@FreeBSD.ORG  Thu Aug  6 14:21:12 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 144E71065677
	for <freebsd-fs@freebsd.org>; Thu,  6 Aug 2009 14:21:12 +0000 (UTC)
	(envelope-from trevor.hearn@Vanderbilt.Edu)
Received: from mailgate.vanderbilt.edu (mailgate.vanderbilt.edu [129.59.4.20])
	by mx1.freebsd.org (Postfix) with ESMTP id CC5DD8FC1D
	for <freebsd-fs@freebsd.org>; Thu,  6 Aug 2009 14:21:11 +0000 (UTC)
Received: from its-hcwnem22.ds.vanderbilt.edu ([10.1.137.30])
	by mailgate06.csm.vanderbilt.edu (8.14.1/8.14.1) with ESMTP id
	n76Dp5Sb029275
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT)
	for <freebsd-fs@freebsd.org>; Thu, 6 Aug 2009 08:51:05 -0500
Received: from its-hcwnem03.ds.Vanderbilt.edu ([10.1.137.103]) by
	ITS-SCWNEM24.ds.Vanderbilt.edu ([::1]) with mapi;
	Thu, 6 Aug 2009 08:51:05 -0500
From: "Hearn, Trevor" <trevor.hearn@Vanderbilt.Edu>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Date: Thu, 6 Aug 2009 08:51:04 -0500
Thread-Topic: UFS Filesystem issues, and the loss of my hair...
Thread-Index: AQHKFpzLuuBuJvtxq0OeiuOtMxVDrQ==
Message-ID: <8E9591D8BCB72D4C8DE0884D9A2932DC35BD34C3@ITS-HCWNEM03.ds.Vanderbilt.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5, 1.2.40,
	4.0.166 definitions=2009-08-06_06:2009-07-24, 2009-08-06,
	2009-08-06 signatures=0
X-PPS: No, score=0
Subject: UFS Filesystem issues, and the loss of my hair...
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Aug 2009 14:21:12 -0000

First off, let me state that I love FreeBSD. I've used it for years, and ha=
ve not had any major problems with it... Until now.

As you can tell, I work for a major university. I setup a large storage arr=
ay to hold data for a project they have here. No great shakes, just some st=
andard files and such. The fun started when I started loading users onto th=
e system, and they started using it... Isn't that always the case? Now, I g=
et ufs_dirbad errors, and the system hard locks. This isn't the worst thing=
 that could happen, but when you're talking about file partitions the size =
that I am using, the fsck takes FOREVER. Somewhere on the order of 1.5 hour=
s. During that time, I am bringing the individual shares/partitions online,=
 but the users suffer. I've asked about this before, in a different forum, =
but got no usable information that I could see. So, here goes...

The system is as such. A dell 2950 1U server, with a Qlogic Fibre Channel c=
ard. It is connected to two Promise Array chassis, 610 series, each with 16=
 drives. Each chassis is running RAID 6, which gives me about 12.73tb of st=
orage per chassis. From there, the logical drives are sliced up into smalle=
r partitions. At most, I have a 3.6tb partition. The smallest is a 100gig p=
artition.

Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/mfid0s1a    197G     10G    170G     6%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/da0p1       1.8T    1.5T    130G    92%    /slice1
/dev/da0p5       2.7T    1.8T    661G    74%    /slice2
/dev/da0p9       250G     21G    209G     9%    /slice3
/dev/da1p3       103G     12G     83G    12%    /slice4
/dev/da1p4       205G     54G    135G    29%    /slice5
/dev/da1p5       103G    7.3G     87G     8%    /slice6
/dev/da1p6       103G     22G     72G    23%    /slice7
etc...

I had to use GPT to setup the partitions, and they are using UFS2 for the f=
ilesystem. Now... If that's not fun enough... I have TWO of these creatures=
, which RSYNC every 4 hours. The secondary system is across campus, and sit=
s idle 99% of the time. Every 4 hours, in a stepped schedule, the primary a=
rray syncs to the secondary array. If the primary goes down, I FSCK, and an=
y files that are fried, I bring back across from the secondary and replace =
them. This has worked OK for a while, but now I am getting Kernel Panics on=
 a regular basis. I've been told to migrate to a different filesystem, but =
my options are ZFS and using GJOURNAL with UFS, from what I can tell. I nee=
d something repeatable, simple, and I need something robust. I have NO idea=
 why I keep getting errors like this, but I imagine it's a cascading effect=
 of other hangs that have caused more corruption.

I'd buy a fella, or gal, a cup of coffee and a pop-tart if they could help =
a brother out. I have checked out this link:
http://phaq.phunsites.net/2007/07/01/ufs_dirbad-panic-with-mangled-entries-=
in-ufs/
and decided that I need to give this a shot after hours, but being the kind=
a guy I am, I need to make sure I am covering all of my bases.=20

Anyone got any ideas?

Thanks!

-T