From owner-freebsd-stable@FreeBSD.ORG Sat Aug 24 23:12:11 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id CE832E13; Sat, 24 Aug 2013 23:12:11 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 8329229F3; Sat, 24 Aug 2013 23:12:11 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAN48GVKDaFve/2dsb2JhbABagzxRgye5dYJ1gTF0giQBAQQBIwRSBRYOCgICDRkCWQaIDgakRZF5gSmPGwEzB4JogTEDqU+DOiCBbg X-IronPort-AV: E=Sophos;i="4.89,949,1367985600"; d="scan'208";a="47256237" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 24 Aug 2013 19:12:03 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 856A5B3F44; Sat, 24 Aug 2013 19:12:03 -0400 (EDT) Date: Sat, 24 Aug 2013 19:12:03 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <2077357040.13269373.1377385923534.JavaMail.root@uoguelph.ca> In-Reply-To: <20130824174103.GQ4972@kib.kiev.ua> Subject: Re: NFS deadlock on 9.2-Beta1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: J David , freebsd-stable , John Baldwin X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Aug 2013 23:12:11 -0000 Kostik wrote: > On Sat, Aug 24, 2013 at 01:08:05PM -0400, J David wrote: > > The requested information about the deadlock was finally obtained > > and > > provided off-list to the requested parties due to size. > > Thank you, the problem is clear now. > > The problematic process backtrace is > > Tracing command httpd pid 86383 tid 100138 td 0xfffffe000b7b2900 > sched_switch() at sched_switch+0x234/frame 0xffffff834c442360 > mi_switch() at mi_switch+0x15c/frame 0xffffff834c4423a0 > sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c4423e0 > sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c442410 > sleeplk() at sleeplk+0x11a/frame 0xffffff834c442460 > __lockmgr_args() at __lockmgr_args+0x9a9/frame 0xffffff834c442580 > nfs_lock1() at nfs_lock1+0x87/frame 0xffffff834c4425b0 > VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xffffff834c4425e0 > _vn_lock() at _vn_lock+0x63/frame 0xffffff834c442640 > ncl_upgrade_vnlock() at ncl_upgrade_vnlock+0x5e/frame > 0xffffff834c442670 > ncl_bioread() at ncl_bioread+0x195/frame 0xffffff834c4427e0 > VOP_READ_APV() at VOP_READ_APV+0xd1/frame 0xffffff834c442810 > vn_rdwr() at vn_rdwr+0x2bc/frame 0xffffff834c4428d0 > kern_sendfile() at kern_sendfile+0xa90/frame 0xffffff834c442ac0 > do_sendfile() at do_sendfile+0x92/frame 0xffffff834c442b20 > amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c442c30 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c442c30 > --- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b24f4c, > rsp = 0x7fffffffce98, rbp = 0x7fffffffd1d0 --- > > It tries to do the upgrade of the nfs vnode lock, and for this, the > lock > is dropped and re-acquired. Since this happens with the vnode vm > object' > page busied, we get a reversal between vnode lock and page busy > state. > So effectively, my suspicion that NFS read path drops vnode lock was > true, and in fact I knew about the upgrade. > Ouch. I had forgotten that LK_UPGRADE could result in the shared lock being dropped. I'll admit I've never liked the lock upgrade in nfs_read(), but I'm not sure how to avoid it. I just looked at the commit log message for r138469, which is where this appeared in the old NFS client. (The new NFS client just cloned this code.) It basically notes that with a shared lock, new pages can be faulted in for the vnode while vinvalbuf() is in progress, causing it to fail (I suspect "fail" means never completed?). At the very least, I don't think the lock upgrade is needed unless a call to vinvalbuf() is going to be done. (I'm wondering is a dedicated lock used to serialize this case might be better than using a node LK_UPGRADE?) I think I'll take a closer look at the vinvalbuf() code in head. Do others have any comments on this? (I added jhb@ to the cc list, since he may be familiar with this?) But none of this can happen quickly, so it wouldn't be feasible for stable/9 or even 10.0 at this point in time. rick > I think the easiest route is to a partial merge of the r253927 from > HEAD. > > Index: fs > =================================================================== > --- fs (revision 254800) > +++ fs (working copy) > > Property changes on: fs > ___________________________________________________________________ > Modified: svn:mergeinfo > Merged /head/sys/fs:r253927 > Index: kern/uipc_syscalls.c > =================================================================== > --- kern/uipc_syscalls.c (revision 254800) > +++ kern/uipc_syscalls.c (working copy) > @@ -2124,11 +2124,6 @@ > else { > ssize_t resid; > > - /* > - * Ensure that our page is still around > - * when the I/O completes. > - */ > - vm_page_io_start(pg); > VM_OBJECT_UNLOCK(obj); > > /* > @@ -2144,10 +2139,8 @@ > IO_VMIO | ((MAXBSIZE / bsize) << IO_SEQSHIFT), > td->td_ucred, NOCRED, &resid, td); > VFS_UNLOCK_GIANT(vfslocked); > - VM_OBJECT_LOCK(obj); > - vm_page_io_finish(pg); > - if (!error) > - VM_OBJECT_UNLOCK(obj); > + if (error) > + VM_OBJECT_LOCK(obj); > mbstat.sf_iocnt++; > } > if (error) { > Index: . > =================================================================== > --- . (revision 254800) > +++ . (working copy) > > Property changes on: . > ___________________________________________________________________ > Modified: svn:mergeinfo > Merged /head/sys:r253927 >