From owner-freebsd-current Fri Dec 6 11:41:11 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B10237B401 for ; Fri, 6 Dec 2002 11:41:09 -0800 (PST) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9C33B43E4A for ; Fri, 6 Dec 2002 11:41:08 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.3/8.12.3) with ESMTP id gB6Jf659093594; Fri, 6 Dec 2002 11:41:06 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200212061941.gB6Jf659093594@beastie.mckusick.com> To: Archie Cobbs Subject: Re: backgroud fsck is still locking up system (fwd) Cc: Nate Lawson , freebsd-current@FreeBSD.org In-Reply-To: Your message of "Fri, 06 Dec 2002 10:57:13 PST." <200212061857.gB6IvDAP065049@arch20m.dellroad.org> Date: Fri, 06 Dec 2002 11:41:06 -0800 From: Kirk McKusick Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG From: Archie Cobbs Subject: Re: backgroud fsck is still locking up system (fwd) In-Reply-To: To: Nate Lawson Date: Fri, 6 Dec 2002 10:57:13 -0800 (PST) CC: Kirk McKusick , Archie Cobbs , freebsd-current@FreeBSD.org X-ASK-Info: Whitelist match Nate Lawson wrote: > > Does the background fsck process continue to run, or does the whole > > system come to a halt? If the fsck process continues to run, what > > happens when it eventually finishes? Is the system still dead, or > > does it come back to life? If the system does not come back to life > > can you get me the output of `ps axl'? If not, can you break into > > the debugger and get a ps output? (You will need to have the DDB > > option specified in your config file). > > Sorry for butting in. I think Archie is referring to bg fsck gaining > an unfair share of cpu due to it running due to IO completions. Last I > heard, we were waiting until after 5.0 to experiment with scheduler > changes to make it more fair. I have not seen any hard locks or other > problems with bg fsck after your commit. I'm actually seeing something different. The box becomes unresponsive (except for virtual console changes and CTRL-ALT-ESC) but there's no disk activity. It never recovers. Reproduced it again just now. After pulling the plug and rebooting I didn't touch the box. It booted normally, started background fsck, and the HDD light was blinking as expected. After about 10 seconds, rather suddenly the HDD light stopped blinking. At this point it was pretty dead. Broke into the debugger and it showed a similar 'ps' output to what I previously posted. -Archie Your ps shows fsck_ufs and the syncer process both blocked on "nbufbs". That means the system has blocked them from running bacause it feels that there are too many dirty buffers. What you are probably experiencing is that you have a relatively small memory machine which has a rather low threshhold for blocking on dirty buffers. All the dirty buffers in your system are held by the indirect blocks of the snapshot and thus the bufdaemon cannot push them out. That task can only be done by the syncer who is also blocked. Could you please run the following command on your system and send me the results: sysctl vfs.lodirtybuffers sysctl vfs.hidirtybuffers sysctl vfs.numdirtybuffers both before and after the lockup. If you cannot run this command after the lockup, the global variable names are: lodirtybuffers hidirtybuffers numdirtybuffers If my hypothesis is correct, that will let me tweek the thrshholds on dirty buffers to get a solution. Kirk McKusick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message