Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Aug 2013 19:12:03 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        J David <j.david.lists@gmail.com>, freebsd-stable <freebsd-stable@freebsd.org>, John Baldwin <jhb@freebsd.org>
Subject:   Re: NFS deadlock on 9.2-Beta1
Message-ID:  <2077357040.13269373.1377385923534.JavaMail.root@uoguelph.ca>
In-Reply-To: <20130824174103.GQ4972@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:
> On Sat, Aug 24, 2013 at 01:08:05PM -0400, J David wrote:
> > The requested information about the deadlock was finally obtained
> > and
> > provided off-list to the requested parties due to size.
> 
> Thank you, the problem is clear now.
> 
> The problematic process backtrace is
> 
> Tracing command httpd pid 86383 tid 100138 td 0xfffffe000b7b2900
> sched_switch() at sched_switch+0x234/frame 0xffffff834c442360
> mi_switch() at mi_switch+0x15c/frame 0xffffff834c4423a0
> sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c4423e0
> sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c442410
> sleeplk() at sleeplk+0x11a/frame 0xffffff834c442460
> __lockmgr_args() at __lockmgr_args+0x9a9/frame 0xffffff834c442580
> nfs_lock1() at nfs_lock1+0x87/frame 0xffffff834c4425b0
> VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xffffff834c4425e0
> _vn_lock() at _vn_lock+0x63/frame 0xffffff834c442640
> ncl_upgrade_vnlock() at ncl_upgrade_vnlock+0x5e/frame
> 0xffffff834c442670
> ncl_bioread() at ncl_bioread+0x195/frame 0xffffff834c4427e0
> VOP_READ_APV() at VOP_READ_APV+0xd1/frame 0xffffff834c442810
> vn_rdwr() at vn_rdwr+0x2bc/frame 0xffffff834c4428d0
> kern_sendfile() at kern_sendfile+0xa90/frame 0xffffff834c442ac0
> do_sendfile() at do_sendfile+0x92/frame 0xffffff834c442b20
> amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c442c30
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c442c30
> --- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b24f4c,
> rsp = 0x7fffffffce98, rbp = 0x7fffffffd1d0 ---
> 
> It tries to do the upgrade of the nfs vnode lock, and for this, the
> lock
> is dropped and re-acquired. Since this happens with the vnode vm
> object'
> page busied, we get a reversal between vnode lock and page busy
> state.
> So effectively, my suspicion that NFS read path drops vnode lock was
> true, and in fact I knew about the upgrade.
> 
Ouch. I had forgotten that LK_UPGRADE could result in the shared lock
being dropped.

I'll admit I've never liked the lock upgrade in nfs_read(), but I'm
not sure how to avoid it. I just looked at the commit log message
for r138469, which is where this appeared in the old NFS client.
(The new NFS client just cloned this code.)

It basically notes that with a shared lock, new pages can be faulted
in for the vnode while vinvalbuf() is in progress, causing it to fail
(I suspect "fail" means never completed?).

At the very least, I don't think the lock upgrade is needed unless
a call to vinvalbuf() is going to be done. (I'm wondering is a dedicated
lock used to serialize this case might be better than using a node LK_UPGRADE?)
I think I'll take a closer look at the vinvalbuf() code in head.

Do others have any comments on this? (I added jhb@ to the cc list, since he
may be familiar with this?)

But none of this can happen quickly, so it wouldn't be feasible for stable/9
or even 10.0 at this point in time.

rick

> I think the easiest route is to a partial merge of the r253927 from
> HEAD.
> 
> Index: fs
> ===================================================================
> --- fs	(revision 254800)
> +++ fs	(working copy)
> 
> Property changes on: fs
> ___________________________________________________________________
> Modified: svn:mergeinfo
>    Merged /head/sys/fs:r253927
> Index: kern/uipc_syscalls.c
> ===================================================================
> --- kern/uipc_syscalls.c	(revision 254800)
> +++ kern/uipc_syscalls.c	(working copy)
> @@ -2124,11 +2124,6 @@
>  			else {
>  				ssize_t resid;
>  
> -				/*
> -				 * Ensure that our page is still around
> -				 * when the I/O completes.
> -				 */
> -				vm_page_io_start(pg);
>  				VM_OBJECT_UNLOCK(obj);
>  
>  				/*
> @@ -2144,10 +2139,8 @@
>  				    IO_VMIO | ((MAXBSIZE / bsize) << IO_SEQSHIFT),
>  				    td->td_ucred, NOCRED, &resid, td);
>  				VFS_UNLOCK_GIANT(vfslocked);
> -				VM_OBJECT_LOCK(obj);
> -				vm_page_io_finish(pg);
> -				if (!error)
> -					VM_OBJECT_UNLOCK(obj);
> +				if (error)
> +					VM_OBJECT_LOCK(obj);
>  				mbstat.sf_iocnt++;
>  			}
>  			if (error) {
> Index: .
> ===================================================================
> --- .	(revision 254800)
> +++ .	(working copy)
> 
> Property changes on: .
> ___________________________________________________________________
> Modified: svn:mergeinfo
>    Merged /head/sys:r253927
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2077357040.13269373.1377385923534.JavaMail.root>