From owner-freebsd-stable@FreeBSD.ORG Tue Jul 29 18:05:01 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D7E237B401; Tue, 29 Jul 2003 18:05:01 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2AB5B43F85; Tue, 29 Jul 2003 18:05:00 -0700 (PDT) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id <305LG0N5>; Tue, 29 Jul 2003 21:04:59 -0400 Message-ID: From: Don Bowman To: Don Bowman , 'Robert Watson' , Dave Dolson Date: Tue, 29 Jul 2003 21:04:54 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" cc: "'freebsd-stable@freebsd.org'" Subject: RE: kernel deadlock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jul 2003 01:05:01 -0000 From: Don Bowman [mailto:don@sandvine.com] > > From: Robert Watson [mailto:rwatson@freebsd.org] > > On Tue, 29 Jul 2003, Dave Dolson wrote: > > > > > To follow up, I've discovered that the system has > exhausted its "FFS > > > node" malloc type. > ... > > > > Some problems with this have turned up in -CURRENT on large-memory > > machines where some of the scaling factors have been off. In > > We currently have kern.maxvnodes=70354 set (automatically > scaled). This > is a 1GB box. > > I will try re-running the test with less. > > when it hits kern.maxvnodes, what will it do? After applying the fixes from RELENG_4 for kern/52425, I can still easily reproduce this hang without low memory. Further debugging shows that vnlru process is waiting on vlrup. This line is shown below. ie vnlru_nowhere is being incremented ever 3 seconds. static void vnlru_proc(void) { ... s = splbio(); for (;;) { ... if (done == 0) { vnlru_nowhere++; tsleep(vnlruproc, PPAUSE, "vlrup", hz * 3); } } splx(s); syncher is in vlruwk wait from getnewvnode(). lots of other processes waiting on ffsvgt. this implies that vlrureclaim() was unable to free anything. i have maxvnode = 35k. as soon as i hit this value, my system locked up [bash on serial shell non-responsive, serial driver echos chars, can drop into ddb]. Processes which don't use filesystem seem to continue to run ok. A couple of procs are waiting on inode: env, cron. These never come out of waiting for it. suggestions? db> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 649 dc35a8a0 e0a32000 0 641 641 004104 3 ffsvgt c03698a8 atrun 648 dc35a3c0 e0e36000 0 647 648 000014 3 vlruwk c0364c90 cron 647 dc35b740 e03d4000 0 135 135 000004 3 ppwait dc35b740 cron 646 dc35b0c0 e03ee000 0 635 101 004004 3 inode c368ee00 env 645 dc35ad80 e03f1000 0 212 644 004006 3 ffsvgt c03698a8 grep 644 dc35aa40 e0400000 0 212 644 004006 3 ffsvgt c03698a8 sysctl 641 dc35a080 e0e4c000 0 640 641 004084 3 wait dc35a080 sh 640 dc35a220 e0e39000 0 135 135 000084 3 piperd e037c5c0 cron 635 dc35a560 e0e32000 0 101 101 004084 3 piperd e037cd40 sh 456 dc35abe0 e03fc000 0 133 456 4004004 3 ffsvgt c03698a8 tclsh83 212 dc35bdc0 e0392000 0 199 212 004086 3 wait dc35bdc0 bash 199 dc35c440 e036e000 0 1 199 004186 3 wait dc35c440 login 187 dc35c2a0 e0376000 0 1 7 000086 3 select c037c460 snmpd 169 dc35af20 e03e7000 0 1 169 000084 3 nanslp c0364970 siocontrol 163 dc35b260 e03e2000 0 1 163 000084 3 nanslp c0364970 wddt 143 dc35b400 e03dd000 25 1 143 2000184 3 pause e03dd260 sendmail 140 dc35b5a0 e03d9000 0 1 140 000184 3 select c037c460 sendmail 137 dc35b8e0 e03d0000 0 1 137 000184 3 select c037c460 sshd 135 dc35ba80 e03c2000 0 1 135 000004 3 inode c35f4400 cron 133 dc35bc20 e0397000 0 1 133 000084 3 select c037c460 inetd 124 dc35bf60 e0382000 0 1 124 000084 3 select c037c460 syslogd 101 dc35c100 e037e000 0 1 101 000084 3 wait dc35c100 dhclient 6 dc35c5e0 defd1000 0 0 0 000204 3 vlrup dc35c5e0 vnlru 5 dc35c780 defce000 0 0 0 000204 3 syncer c037c388 syncer 4 dc35c920 defcb000 0 0 0 000204 3 psleep c0364b3c bufdaemon 3 dc35cac0 defc8000 0 0 0 000204 3 psleep c0373280 vmdaemon 2 dc35cc60 defc5000 0 0 0 000204 3 psleep c0352118 pagedaemon 1 dc35ce00 dc361000 0 0 1 004284 3 wait dc35ce00 init 0 c037b760 c040e000 0 0 0 000204 3 sched c037b760 swapper