From owner-freebsd-stable@FreeBSD.ORG Fri Aug 1 11:59:40 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A89E37B401 for ; Fri, 1 Aug 2003 11:59:40 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id C3C6B43FCB for ; Fri, 1 Aug 2003 11:59:39 -0700 (PDT) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id <305LHFQJ>; Fri, 1 Aug 2003 14:59:39 -0400 Message-ID: From: Don Bowman To: "'freebsd-stable@freebsd.org'" Date: Fri, 1 Aug 2003 14:59:37 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Subject: RE: kernel deadlock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Aug 2003 18:59:40 -0000 > On Tue, 29 Jul 2003, Don Bowman wrote: > > > From: Don Bowman [mailto:don@sandvine.com] > > > > > > From: Robert Watson [mailto:rwatson@freebsd.org] > > > > On Tue, 29 Jul 2003, Dave Dolson wrote: > > > > > > > > > To follow up, I've discovered that the system has > > > exhausted its "FFS > > > > > node" malloc type. > > > ... > > > > > > > > Some problems with this have turned up in -CURRENT on > large-memory > > > > machines where some of the scaling factors have been off. In > > > > > > We currently have kern.maxvnodes=70354 set (automatically > > > scaled). This > > > is a 1GB box. > > > > > > I will try re-running the test with less. > > > > > > when it hits kern.maxvnodes, what will it do? > > > > After applying the fixes from RELENG_4 for kern/52425, > > I can still easily reproduce this hang without low memory. > > Further debugging shows that vnlru process is waiting on > > vlrup. This line is shown below. ie vnlru_nowhere is being > > incremented ever 3 seconds. So what is happening here is that vnlru wakes up, runs through, and there is nothing to free, so it goes back to sleep having freed nothing. The caller doesn't wake up. There's no vnodes to free, and everything in the system locks up. One possible solution is to make vnlru more aggressive, so that before giving up, it tries to free pages that have many references etc (which it currently skips). Another option is to have it simply bump the kern.maxvnodes number and wake up the process which called it. Suggestions? --don