From owner-freebsd-current Wed Feb 19 9:15:20 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7064037B401 for ; Wed, 19 Feb 2003 09:15:18 -0800 (PST) Received: from msgbas1x.cos.agilent.com (msgbas1x.cos.agilent.com [192.25.240.36]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9B0D443FA3 for ; Wed, 19 Feb 2003 09:15:17 -0800 (PST) (envelope-from darrylo@soco.agilent.com) Received: from relcos2.cos.agilent.com (relcos2.cos.agilent.com [130.29.152.237]) by msgbas1x.cos.agilent.com (Postfix) with ESMTP id 39F9A17B12 for ; Wed, 19 Feb 2003 10:15:17 -0700 (MST) Received: from mina.soco.agilent.com (mina.soco.agilent.com [141.121.54.157]) by relcos2.cos.agilent.com (Postfix) with ESMTP id CDAE260C for ; Wed, 19 Feb 2003 10:15:16 -0700 (MST) Received: from mina.soco.agilent.com (darrylo@localhost [127.0.0.1]) by mina.soco.agilent.com (8.9.3 (PHNE_25184)/8.9.3 SMKit7.1.1_Agilent) with ESMTP id JAA20542 for ; Wed, 19 Feb 2003 09:15:16 -0800 (PST) Message-Id: <200302191715.JAA20542@mina.soco.agilent.com> To: current@FreeBSD.ORG Subject: Re: background fsck deadlocks with ufs2 and big disk Reply-To: Darryl Okahata In-Reply-To: Your message of "Tue, 18 Feb 2003 19:03:04 PST." <20030219030304.GA71575@HAL9000.homeunix.com> Mime-Version: 1.0 (generated by tm-edit 1.7) Content-Type: text/plain; charset=US-ASCII Date: Wed, 19 Feb 2003 09:15:15 -0800 From: Darryl Okahata Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG David Schultz wrote: > IIRC, Kirk was trying to reproduce this a little while ago in > response to similar reports. He would probably be interested > in any new information. I don't have any useful information, but I do have a data point: My 5.0-RELEASE system recently mysteriously panic'd, which resulted in a partially trashed UFS1 filesystem, which caused bg fsck to hang. Details: * The panic was weird, in that only the first 4-6 characters of the first function (in the panic stacktrace) was displayed on the console (sorry, forgot what it was). Nothing else past that point was shown, and the console was locked up. Ddb was compiled into the kernel, but ctrl-esc did nothing. * The UFS1 filesystem in question (and I assume that it was UFS1, as I did not specify a filesystem type to newfs) is located on a RAID5 vinum volume, consisting of five 80GB disks. * Softupdates is enabled. * When bg fsck hung (w/no disk activity), I could break into the ddb. Unfortunately, I don't know how to use ddb, aside from "ps". * Disabling bg fsck allowed the system to boot. However, fg fsck failed, and I had to do a manual fsck, which spewed lots of nasty "SOFTUPDATE INCONSISTENCY" errors. * Disturbingly (but fortunately), I then unmounted the filesystem (in multi-user mode) and re-ran fsck, and fsck still found errors. There should not have been any errors, as fg fsck just finished running. [ Unfortunately, I've forgotten what they were, and an umount/fsck done right now shows no problems. I think the errors were one of the "incorrect block count" errors. ] * After the fsck, some files were partially truncated (& corrupted?). After investigating, I believe these truncated files (which were NOT recently modified) were in a directory in which other files were being created/written at the time of the panic. -- Darryl Okahata darrylo@soco.agilent.com DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion, or policy of Agilent Technologies, or of the little green men that have been following him all day. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message