From owner-freebsd-stable@FreeBSD.ORG Thu Sep 2 00:12:59 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 72D2B16A4CE for ; Thu, 2 Sep 2004 00:12:59 +0000 (GMT) Received: from afields.ca (afields.ca [216.194.67.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0E3E143D1F for ; Thu, 2 Sep 2004 00:12:59 +0000 (GMT) (envelope-from afields@afields.ca) Received: from afields.ca (localhost.afields.ca [127.0.0.1]) by afields.ca (8.12.11/8.12.11) with ESMTP id i820CwT5094588; Wed, 1 Sep 2004 20:12:58 -0400 (EDT) (envelope-from afields@afields.ca) Received: (from afields@localhost) by afields.ca (8.12.11/8.12.11/Submit) id i820CwUe094587; Wed, 1 Sep 2004 20:12:58 -0400 (EDT) (envelope-from afields) Date: Wed, 1 Sep 2004 20:12:58 -0400 From: Allan Fields To: "Marc G. Fournier" Message-ID: <20040902001258.GE34157@afields.ca> References: <20040831205907.O31538@ganymede.hub.org> <20040901214006.GD34157@afields.ca> <20040901184826.M47186@ganymede.hub.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040901184050.J47186@ganymede.hub.org> User-Agent: Mutt/1.4i cc: freebsd-stable@freebsd.org Subject: Re: vnodes - is there a leak? where are they going? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Sep 2004 00:12:59 -0000 On Wed, Sep 01, 2004 at 06:53:47PM -0300, Marc G. Fournier wrote: > On Wed, 1 Sep 2004, Allan Fields wrote: > > >On Tue, Aug 31, 2004 at 09:21:09PM -0300, Marc G. Fournier wrote: > >> > >>I have two servers, both running 4.10 of within a few days (Aug 5 for > >>venus, Aug 7 for neptune) ... both running jail environments ... one with > >>~60 running, the other with ~80 ... the one with 60 has been running for > >>~25 days now, and is at the border of running out of vnodes: > >> > >>Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: > >>11058 - debug.vnlru_nowhere: 256463 - vlrup > >>Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: > >>13155 - debug.vnlru_nowhere: 256482 - vlrup > >>Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: > >>13092 - debug.vnlru_nowhere: 256482 - vlruwt > >> > >>[..] > >> > >>I've tried shutting down all of the VMs on venus, and umount'd all of the > >>unionfs mounts, as well as the one nfs mount we have ... the above #s are > >>after the VMs (and mounts are recreated ... > >> > >>Now, my understanding of the vnodes is that for every file opened, a vnode > >>is created ... in my case, since I'm using unionfs, there are two vnodes > >>per file ... if it possible that there are 'stale' vnodes that aren't > >>being freed up? Is there some way of 'viewing' the vnode structure? > >> > >>For instance, fstat shows: > >> > >>venus# fstat | wc -l > >> 19531 > > > >You can also try pstat -f|more from the user side. > > Even less: > > venus# fstat | wc -l; pstat -f | wc -l > 20930 > 6555 > > >You might want to setup for remote kernel debugging and peek around the > >system / further examine vnode structures. (If you have physical access > >to two machines you can setup a null modem cable.) > > Unfortunately, I'm working with a remote server here, so am quite limited > right now in what I can do ... anything I can, I will though ... It's been suggested before [ http://www.freebsddiary.org/serial-console.php ] that for management purposes, if two machines are in the same facility, ask them to run a null-modem cable (or even better two) between the serial ports. One could be used for remote console and the other remote debugging (com2). (There are also ways to switch between ddb and kgdb on the same line.) Then you need to update the flags in /boot/loader.conf or device.hints. > >>So, where else are the vnodes going? Is there a 'leak'? What can I look > >>at to try and narrow this down / provide more information? > > > >If the use count isn't decremented (to zero) vnodes wont > >be placed on the freelist. Perhaps something isn't > >calling vrele() where it should in unionfs? You should check the > >reference counts: v_usecount and v_holdcnt on some of the suspect > >vnodes. > > How do I do that? I'm at the limit of my current knowledge right now ... > willing to do the foot work, just don't know the directions to take from > here :( One of the first debugging steps might be to try to reproduce the behaviour on a non-production machine somehow and figure out which code is causing problems. If you can afford to mess around w/ this machine you should set a break-point on vnlru_proc() and step through to see what is causing vnlrureclaim() to fail so done==0 and vnlru_nowhere++ (which is happening a lot for you.) As commented in the source this could be buffer cache and/or namei issues. sys/kern/vfs_subr.c: 164 static int vnlru_nowhere = 0; 165 SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, &vnlru_nowhere, 0, 166 "Number of times the vnlru process ran without success"); .. 530 static void 531 vnlru_proc(void) 532 { .. 563 if (done == 0) { 564 vnlru_nowhere++; 565 tsleep(vnlruproc, PPAUSE, "vlrup", hz * 3); 566 } # ps -auwx|grep -v grep|grep vnlru: root 10 0.0 0.0 0 0 ?? DL 17Jun04 1:43.81 (vnlru) admin handbook: 10.28. What is vnlru? vnlru flushes and frees vnodes when the system hits the kern.maxvnodes limit. This kernel thread sits mostly idle, and only activates if you have a huge amount of RAM and are accessing tens of thousands of tiny files. If this was a filedesc leak this would be much easier as it's all in userspace, but vnode stuff is all on the kernel side. At this point it's a guess as to whether there are problems w/ locking or something else in VFS code or a specific file system. I doubt it's FFS related, but maybe I'm wrong. Are you having trouble doing normal things like start-up daemons in additional jails? Maybe you can gleam some more info by stressing the machine further and watching for errors. > httpd: 7416 > master: 6618 > syslogd: 1117 > qmgr: 780 > pickup: 779 > smtpd: 609 > sshd: 503 > cron: 495 > perl: 279 > trivial-rewrite: 274 > > but, again, those are known/open files ... fstat | wc -l only accounts for > ~20k or so of that list :( Yup, userspace tools alone probably won't be able to provide much useful info. -- Allan Fields, AFRSL - http://afields.ca 2D4F 6806 D307 0889 6125 C31D F745 0D72 39B4 5541