From owner-freebsd-bugs@FreeBSD.ORG Wed Jun 6 20:33:40 2007 Return-Path: X-Original-To: freebsd-bugs@FreeBSD.org Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9F31A16A468 for ; Wed, 6 Jun 2007 20:33:40 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 8C49413C4AE for ; Wed, 6 Jun 2007 20:33:40 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 72D7F1A3C19; Wed, 6 Jun 2007 13:34:59 -0700 (PDT) Received: from rot13.obsecurity.org (rot13.obsecurity.org [192.168.1.5]) by obsecurity.dyndns.org (Postfix) with ESMTP id C8F185119F; Wed, 6 Jun 2007 16:33:39 -0400 (EDT) Received: by rot13.obsecurity.org (Postfix, from userid 1001) id C5581C207; Wed, 6 Jun 2007 16:33:39 -0400 (EDT) Date: Wed, 6 Jun 2007 16:33:39 -0400 From: Kris Kennaway To: "Jeffrey D. Wheelhouse" Message-ID: <20070606203339.GC5908@rot13.obsecurity.org> References: <200706052050.l55KoABX076749@freefall.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200706052050.l55KoABX076749@freefall.freebsd.org> User-Agent: Mutt/1.4.2.2i Cc: freebsd-bugs@FreeBSD.org Subject: Re: kern/104406: [ufs] Processes get stuck in "ufs" state under persistent CPU load X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jun 2007 20:33:40 -0000 On Tue, Jun 05, 2007 at 08:50:10PM +0000, Jeffrey D. Wheelhouse wrote: > The following reply was made to PR kern/104406; it has been noted by GNATS. > > From: "Jeffrey D. Wheelhouse" > To: bug-followup@FreeBSD.org > Cc: > Subject: Re: kern/104406: [ufs] Processes get stuck in "ufs" state under persistent > CPU load > Date: Tue, 05 Jun 2007 16:26:26 -0400 > > I believe we have also experienced this bug (or a very similar one) on > our 8-core amd64 systems under 6.2-RELEASE-p4. > > In our case, "top" shows that the system is 100% CPU utilized, with the > vast majority of it as "system" time. (Ordinarily the system > > In the last case, we ended up with about 200 Apache processes that > looked like this in ps: > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 25000 27121 26860 1977 -4 5 146324 33732 ufs DN ?? 0:03.75 httpd > 25000 27147 37257 1994 -4 5 153748 29280 ufs DN ?? 0:03.72 httpd > 25000 27157 36912 1805 -4 5 150756 26592 ufs DN ?? 0:02.91 httpd > 25000 27224 27030 1845 -4 5 137536 24804 ufs DN ?? 0:01.25 httpd > 25000 27274 26794 1829 -4 5 148140 35416 ufs DN ?? 0:02.90 httpd > > Once a process gets "stuck" in WCHAN ufs, it's blocked indefinitely, as > described here, or at least so slow as to be indistinguishable from > stuck. (Typical wait channels for our httpds are accept or kqread, as > one would expect.) > > Each process in this state counts against the load average, so we often > see load averages north of 200 when this is occurring. (Typical load > average is below 2.) > > Kill enough processes (or possibly enough to hit the "right" process) > and everything picks up again right where it left off. > > I also have no idea how to debug this. See the Developers handbook Kris