From owner-freebsd-current@FreeBSD.ORG Thu Apr 24 21:21:04 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C473A37B401 for ; Thu, 24 Apr 2003 21:21:04 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 314B843F85 for ; Thu, 24 Apr 2003 21:21:04 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (scratch.catspoiler.org [192.168.101.3]) by gw.catspoiler.org (8.12.6/8.12.6) with ESMTP id h3P4KuXB033816; Thu, 24 Apr 2003 21:21:00 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200304250421.h3P4KuXB033816@gw.catspoiler.org> Date: Thu, 24 Apr 2003 21:20:56 -0700 (PDT) From: Don Lewis To: gordont@gnf.org In-Reply-To: <20030424212641.GU9682@roark.gnf.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: current@FreeBSD.org Subject: Re: LOR in NFS server X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Apr 2003 04:21:05 -0000 On 24 Apr, Gordon Tetlow wrote: > I generated it while running nessus against my local machine. > > lock order reversal > 1st 0xc9384c44 inp (inp) @ /local/usr.src/sys/netinet/tcp_input.c:649 > 2nd 0xc05aa84c tcp (tcp) @ /local/usr.src/sys/netinet/tcp_usrreq.c:621 > Stack backtrace: > backtrace(c04e9f03,c05aa84c,c04f0770,c04f0770,c04f1ae4) at backtrace+0x17 > witness_lock(c05aa84c,8,c04f1ae4,26d,0) at witness_lock+0x692 > _mtx_lock_flags(c05aa84c,0,c04f1ae4,26d,0) at _mtx_lock_flags+0xb2 > tcp_usr_rcvd(c8a63800,80,c04ea514,df0e9a9c,3b9aca00) at tcp_usr_rcvd+0x30 > soreceive(c8a63800,df0e9ad8,df0e9ae4,df0e9adc,0) at soreceive+0x86a > nfsrv_rcv(c8a63800,c6d4fb00,4,34,10430) at nfsrv_rcv+0x8a > sowakeup(c8a63800,c8a6384c,c04f11d5,434,108) at sowakeup+0x97 > tcp_input(c21f5400,14,c0304f91,df0e9c5c,c02f60ba) at tcp_input+0x1341 > ip_input(c21f5400,0,c04efede,e9,c21bd280) at ip_input+0x7b0 > swi_net(0,0,c04e4eed,217,c21c73c0) at swi_net+0x111 > ithread_loop(c21c6100,df0e9d48,c04e4d5d,314,c21c8d10) at ithread_loop+0x16c > fork_exit(c02ec2d0,c21c6100,df0e9d48) at fork_exit+0xc0 > fork_trampoline() at fork_trampoline+0x1a > --- trap 0x1, eip = 0, esp = 0xdf0e9d7c, ebp = 0 --- Hmn ... does NFS over TCP even work with a -current box as the server? It looks like tcp_input() has grabbed the locks in tcbinfo and inp, and then tcp_usr_rcvd() attempts to grab the same locks. I can think of three possible ways of fixing this problem. 1) Drop the locks in tcp_input() before calling sorwakeup() and grab them again if necessary. One has to be careful not to break anything by doing this. This also adds overhead for non-NFS traffic. 2) Never call soreceive() from nfsrv_rcv(), always wake nfsd instead. This has the advantage of minimizing the amount of time that the locks are held, but increases overhead under lightly loaded conditions. 3) Somehow tell tcp_usr_rcvd() not to attempt to grab the locks in this specific case.