From owner-freebsd-fs@FreeBSD.ORG Thu Aug 6 14:21:12 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 144E71065677 for ; Thu, 6 Aug 2009 14:21:12 +0000 (UTC) (envelope-from trevor.hearn@Vanderbilt.Edu) Received: from mailgate.vanderbilt.edu (mailgate.vanderbilt.edu [129.59.4.20]) by mx1.freebsd.org (Postfix) with ESMTP id CC5DD8FC1D for ; Thu, 6 Aug 2009 14:21:11 +0000 (UTC) Received: from its-hcwnem22.ds.vanderbilt.edu ([10.1.137.30]) by mailgate06.csm.vanderbilt.edu (8.14.1/8.14.1) with ESMTP id n76Dp5Sb029275 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Thu, 6 Aug 2009 08:51:05 -0500 Received: from its-hcwnem03.ds.Vanderbilt.edu ([10.1.137.103]) by ITS-SCWNEM24.ds.Vanderbilt.edu ([::1]) with mapi; Thu, 6 Aug 2009 08:51:05 -0500 From: "Hearn, Trevor" To: "freebsd-fs@freebsd.org" Date: Thu, 6 Aug 2009 08:51:04 -0500 Thread-Topic: UFS Filesystem issues, and the loss of my hair... Thread-Index: AQHKFpzLuuBuJvtxq0OeiuOtMxVDrQ== Message-ID: <8E9591D8BCB72D4C8DE0884D9A2932DC35BD34C3@ITS-HCWNEM03.ds.Vanderbilt.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5, 1.2.40, 4.0.166 definitions=2009-08-06_06:2009-07-24, 2009-08-06, 2009-08-06 signatures=0 X-PPS: No, score=0 Subject: UFS Filesystem issues, and the loss of my hair... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Aug 2009 14:21:12 -0000 First off, let me state that I love FreeBSD. I've used it for years, and ha= ve not had any major problems with it... Until now. As you can tell, I work for a major university. I setup a large storage arr= ay to hold data for a project they have here. No great shakes, just some st= andard files and such. The fun started when I started loading users onto th= e system, and they started using it... Isn't that always the case? Now, I g= et ufs_dirbad errors, and the system hard locks. This isn't the worst thing= that could happen, but when you're talking about file partitions the size = that I am using, the fsck takes FOREVER. Somewhere on the order of 1.5 hour= s. During that time, I am bringing the individual shares/partitions online,= but the users suffer. I've asked about this before, in a different forum, = but got no usable information that I could see. So, here goes... The system is as such. A dell 2950 1U server, with a Qlogic Fibre Channel c= ard. It is connected to two Promise Array chassis, 610 series, each with 16= drives. Each chassis is running RAID 6, which gives me about 12.73tb of st= orage per chassis. From there, the logical drives are sliced up into smalle= r partitions. At most, I have a 3.6tb partition. The smallest is a 100gig p= artition. Filesystem Size Used Avail Capacity Mounted on /dev/mfid0s1a 197G 10G 170G 6% / devfs 1.0K 1.0K 0B 100% /dev /dev/da0p1 1.8T 1.5T 130G 92% /slice1 /dev/da0p5 2.7T 1.8T 661G 74% /slice2 /dev/da0p9 250G 21G 209G 9% /slice3 /dev/da1p3 103G 12G 83G 12% /slice4 /dev/da1p4 205G 54G 135G 29% /slice5 /dev/da1p5 103G 7.3G 87G 8% /slice6 /dev/da1p6 103G 22G 72G 23% /slice7 etc... I had to use GPT to setup the partitions, and they are using UFS2 for the f= ilesystem. Now... If that's not fun enough... I have TWO of these creatures= , which RSYNC every 4 hours. The secondary system is across campus, and sit= s idle 99% of the time. Every 4 hours, in a stepped schedule, the primary a= rray syncs to the secondary array. If the primary goes down, I FSCK, and an= y files that are fried, I bring back across from the secondary and replace = them. This has worked OK for a while, but now I am getting Kernel Panics on= a regular basis. I've been told to migrate to a different filesystem, but = my options are ZFS and using GJOURNAL with UFS, from what I can tell. I nee= d something repeatable, simple, and I need something robust. I have NO idea= why I keep getting errors like this, but I imagine it's a cascading effect= of other hangs that have caused more corruption. I'd buy a fella, or gal, a cup of coffee and a pop-tart if they could help = a brother out. I have checked out this link: http://phaq.phunsites.net/2007/07/01/ufs_dirbad-panic-with-mangled-entries-= in-ufs/ and decided that I need to give this a shot after hours, but being the kind= a guy I am, I need to make sure I am covering all of my bases.=20 Anyone got any ideas? Thanks! -T