Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 06 Dec 2002 11:41:06 -0800
From:      Kirk McKusick <mckusick@beastie.mckusick.com>
To:        Archie Cobbs <archie@dellroad.org>
Cc:        Nate Lawson <nate@root.org>, freebsd-current@FreeBSD.org
Subject:   Re: backgroud fsck is still locking up system (fwd) 
Message-ID:  <200212061941.gB6Jf659093594@beastie.mckusick.com>
In-Reply-To: Your message of "Fri, 06 Dec 2002 10:57:13 PST." <200212061857.gB6IvDAP065049@arch20m.dellroad.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
	From: Archie Cobbs <archie@dellroad.org>
	Subject: Re: backgroud fsck is still locking up system (fwd)
	In-Reply-To: <Pine.BSF.4.21.0212061023510.15885-100000@root.org>
	To: Nate Lawson <nate@root.org>
	Date: Fri, 6 Dec 2002 10:57:13 -0800 (PST)
	CC: Kirk McKusick <mckusick@beastie.mckusick.com>,
	   Archie Cobbs <archie@dellroad.org>, freebsd-current@FreeBSD.org
	X-ASK-Info: Whitelist match

	Nate Lawson wrote:
	> > Does the background fsck process continue to run, or does the whole
	> > system come to a halt? If the fsck process continues to run, what 
	> > happens when it eventually finishes? Is the system still dead, or 
	> > does it come back to life? If the system does not come back to life
	> > can you get me the output of `ps axl'? If not, can you break into
	> > the debugger and get a ps output? (You will need to have the DDB
	> > option specified in your config file).
	> 
	> Sorry for butting in.  I think Archie is referring to bg fsck gaining
	> an unfair share of cpu due to it running due to IO completions. Last I
	> heard, we were waiting until after 5.0 to experiment with scheduler
	> changes to make it more fair.  I have not seen any hard locks or other
	> problems with bg fsck after your commit.

	I'm actually seeing something different. The box becomes unresponsive
	(except for virtual console changes and CTRL-ALT-ESC) but there's no
	disk activity. It never recovers.

	Reproduced it again just now. After pulling the plug and rebooting
	I didn't touch the box.  It booted normally, started background
	fsck, and the HDD light was blinking as expected. After about 10
	seconds, rather suddenly the HDD light stopped blinking.  At this
	point it was pretty dead.  Broke into the debugger and it showed a
	similar 'ps' output to what I previously posted.

	-Archie

Your ps shows fsck_ufs and the syncer process both blocked on "nbufbs".
That means the system has blocked them from running bacause it feels
that there are too many dirty buffers. What you are probably experiencing
is that you have a relatively small memory machine which has a rather
low threshhold for blocking on dirty buffers. All the dirty buffers
in your system are held by the indirect blocks of the snapshot and
thus the bufdaemon cannot push them out. That task can only be done
by the syncer who is also blocked. Could you please run the following
command on your system and send me the results:

	sysctl vfs.lodirtybuffers
	sysctl vfs.hidirtybuffers
	sysctl vfs.numdirtybuffers

both before and after the lockup. If you cannot run this command after
the lockup, the global variable names are:

	lodirtybuffers
	hidirtybuffers
	numdirtybuffers

If my hypothesis is correct, that will let me tweek the thrshholds on
dirty buffers to get a solution.

	Kirk McKusick

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200212061941.gB6Jf659093594>