Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Aug 2009 08:51:04 -0500
From:      "Hearn, Trevor" <trevor.hearn@Vanderbilt.Edu>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   UFS Filesystem issues, and the loss of my hair...
Message-ID:  <8E9591D8BCB72D4C8DE0884D9A2932DC35BD34C3@ITS-HCWNEM03.ds.Vanderbilt.edu>

next in thread | raw e-mail | index | archive | help
First off, let me state that I love FreeBSD. I've used it for years, and ha=
ve not had any major problems with it... Until now.

As you can tell, I work for a major university. I setup a large storage arr=
ay to hold data for a project they have here. No great shakes, just some st=
andard files and such. The fun started when I started loading users onto th=
e system, and they started using it... Isn't that always the case? Now, I g=
et ufs_dirbad errors, and the system hard locks. This isn't the worst thing=
 that could happen, but when you're talking about file partitions the size =
that I am using, the fsck takes FOREVER. Somewhere on the order of 1.5 hour=
s. During that time, I am bringing the individual shares/partitions online,=
 but the users suffer. I've asked about this before, in a different forum, =
but got no usable information that I could see. So, here goes...

The system is as such. A dell 2950 1U server, with a Qlogic Fibre Channel c=
ard. It is connected to two Promise Array chassis, 610 series, each with 16=
 drives. Each chassis is running RAID 6, which gives me about 12.73tb of st=
orage per chassis. From there, the logical drives are sliced up into smalle=
r partitions. At most, I have a 3.6tb partition. The smallest is a 100gig p=
artition.

Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/mfid0s1a    197G     10G    170G     6%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/da0p1       1.8T    1.5T    130G    92%    /slice1
/dev/da0p5       2.7T    1.8T    661G    74%    /slice2
/dev/da0p9       250G     21G    209G     9%    /slice3
/dev/da1p3       103G     12G     83G    12%    /slice4
/dev/da1p4       205G     54G    135G    29%    /slice5
/dev/da1p5       103G    7.3G     87G     8%    /slice6
/dev/da1p6       103G     22G     72G    23%    /slice7
etc...

I had to use GPT to setup the partitions, and they are using UFS2 for the f=
ilesystem. Now... If that's not fun enough... I have TWO of these creatures=
, which RSYNC every 4 hours. The secondary system is across campus, and sit=
s idle 99% of the time. Every 4 hours, in a stepped schedule, the primary a=
rray syncs to the secondary array. If the primary goes down, I FSCK, and an=
y files that are fried, I bring back across from the secondary and replace =
them. This has worked OK for a while, but now I am getting Kernel Panics on=
 a regular basis. I've been told to migrate to a different filesystem, but =
my options are ZFS and using GJOURNAL with UFS, from what I can tell. I nee=
d something repeatable, simple, and I need something robust. I have NO idea=
 why I keep getting errors like this, but I imagine it's a cascading effect=
 of other hangs that have caused more corruption.

I'd buy a fella, or gal, a cup of coffee and a pop-tart if they could help =
a brother out. I have checked out this link:
http://phaq.phunsites.net/2007/07/01/ufs_dirbad-panic-with-mangled-entries-=
in-ufs/
and decided that I need to give this a shot after hours, but being the kind=
a guy I am, I need to make sure I am covering all of my bases.=20

Anyone got any ideas?

Thanks!

-T




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8E9591D8BCB72D4C8DE0884D9A2932DC35BD34C3>