From owner-freebsd-current@FreeBSD.ORG Fri May 16 23:58:17 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 701E937B6DA; Fri, 16 May 2003 23:58:17 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id B046F43F3F; Fri, 16 May 2003 23:58:16 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h4H6vwM7059811; Fri, 16 May 2003 23:58:03 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200305170658.h4H6vwM7059811@gw.catspoiler.org> Date: Fri, 16 May 2003 23:57:58 -0700 (PDT) From: Don Lewis To: rwatson@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: ticso@cicely.de cc: current@FreeBSD.org Subject: Re: vlruwk deadlock (was: NFS troubles on recent -current.) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 May 2003 06:58:17 -0000 On 16 May, Robert Watson wrote: > > On Fri, 16 May 2003, Bernd Walter wrote: > >> I had an io deadlook on the same server today while doing a make release >> on an alpha nfs client. I don't know if it is related to the given >> patch or not. It happened shortly after the client checked out the >> ports. Sorry - I forgot to take a dump. > > I've been running into similar sorts of deadlocks on my diskless crash > boxes, and have dropped some information to Don on them. Try the > following also from ddb: > > print numvnodes > print desiredvnodes > print vnlruproc_sig > print vnlru_nowhere > > This will print some information on the number of active vnodes; one of > the characteristics of my nfs client/server box in its deadlocked state is > that it has exceeded the maximum number of vnodes, presumably by > necessity. I've been unable to reproduce this problem so far in my environment. Since I've only got one box running -current, I've been doing testing by NFS mounting the local file system back to itself. I've run make -j10 buildworld make DESTDIR=/mnt installworld where both /usr/obj and /mnt were NFS mount points. I also just tried NFS mounting / on /mnt and running simultaneous find -x . -type f -print0 | xargs -0 cat >/dev/null on both / and /mnt. While this was running, which took well over an hour, I monitored the vnode-related sysctl variables. The value of vfs.freevnodes had an interesting oscillitory behaviour. There was a short-period oscillation of about 3000-4000, and a long term oscillation of 30000-40000. It got as low as a few hundred. I didn't see any sign of vnode reference leaks on the client side and was able to umount the NFS file system without error. Here are the final sysctl values: kern.maxvnodes: 70112 kern.minvnodes: 17528 vfs.numvnodes: 63719 vfs.wantfreevnodes: 25 vfs.freevnodes: 55379 debug.vnlru_nowhere: 0 and gdb -k shows that vnlruproc_sig is also 0. BTW, the first time I tried this, I left off the -x option to find and got this vnode lock assertion failure. VOP_UNLOCK: 0xc748c000 is not locked but should be Debugger("Lock violation. ") Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db> tr Debugger(c0516f98,c051716f,c748c000,c0516fd9,e6e02988) at Debugger+0x54 vfs_badlock(c0516fd9,c051716f,c748c000,c0582880,c748c000) at vfs_badlock+0x45 assert_vop_locked(c748c000,c051716f,c748c000,e6e029fc,c02dabed) at assert_vop_lo cked+0x62 vop_unlock_pre(e6e029dc,e6e02bec,c6195b00,186a0,e6e029c0) at vop_unlock_pre+0x38 pfs_lookup(e6e02a38,c0523d3b,c68e7d10,e6e02a38,c68e7d10) at pfs_lookup+0x2ed lookup(e6e02bd8,0,c0516a45,a4,c68e7d10) at lookup+0x366 namei(e6e02bd8,e6e02ae4,c0316d5d,c05ae8c0,1) at namei+0x24e vn_open_cred(e6e02bd8,e6e02cd8,0,c671b000,e6e02cc4) at vn_open_cred+0x237 vn_open(e6e02bd8,e6e02cd8,0,2ab,c034476b) at vn_open+0x29 kern_open(c68e7d10,4812785b,0,1,0) at kern_open+0x13a open(c68e7d10,e6e02d10,c052a91d,3fb,3) at open+0x30 syscall(2f,2f,2f,ffffffff,8055a00) at syscall+0x26e Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (5, FreeBSD ELF32, open), eip = 0x480bc6f3, esp = 0xbfbff9fc, ebp = 0xbfbffa98 --- The seems to be the call to VOP_UNLOCK() at line 415 in pseudofs_vnops.c. The vnode should be locked at this point unless this the ISDOTDOT "if" block is getting triggered, which unlocks the vnode and doesn't relock it before jumping to the code which may try to unlock the vnode a second time.