Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Mar 2013 19:33:35 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Rick Macklem <rmacklem@freebsd.org>, fs@freebsd.org
Subject:   Re: Deadlock in the NFS client
Message-ID:  <492562517.3880600.1363217615412.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201303131356.37919.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote:
> I ran into a machine that had a deadlock among certain files on a
> given NFS
> mount today. I'm not sure how best to resolve it, though it seems like
> perhaps there is a bug with how the pool of nfsiod threads is managed.
> Anyway, more details on the actual hang below. This was on 8.x with
> the
> old NFS client, but I don't see anything in HEAD that would fix this.
> 
> First note that the system was idle so it had dropped down to only one
> nfsiod thread.
> 
Hmm, I see the problem and I'm a bit surprised it doesn't bite more often.
It seems to me that this snippet of code from nfs_asyncio() makes too
weak an assumption:
	/*
	 * If none are free, we may already have an iod working on this mount
	 * point.  If so, it will process our request.
	 */
	if (!gotiod) {
		if (nmp->nm_bufqiods > 0) {
			NFS_DPF(ASYNCIO,
		("nfs_asyncio: %d iods are already processing mount %p\n",
				 nmp->nm_bufqiods, nmp));
			gotiod = TRUE;
		}
	}
It assumes that, since an nfsiod thread is processing some buffer for the
mount, it will become available to do this one, which isn't true for your
deadlock.

I think the simple fix would be to recode nfs_asyncio() so that it only
returns 0 if it finds an AVAILABLE nfsiod thread that it has assigned to
do the I/O, getting rid of the above.
The problem with doing this is that it may result in a lot more synchronous
I/O (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe more
synchronous I/O could be avoided by allowing nfs_asyncio() to create a new thread
even if the total is above nfs_iodmax. (I think this would require the fixed array
to be replaced with a linked list and might result in a large number of nfsiod
threads.) Maybe just having a large nfs_iodmax would be an adequate compromise?

Does having a large # of nfsiod threads cause any serious problem for most
systems these days?

I'd be tempted to recode nfs_asyncio() as above and then, instead of nfs_iodmin
and nfs_iodmax, I'd simply have:
- a fixed number of nfsiod threads (this could be a tunable, with the
       understanding that it should be large for good performance)

rick

> The nfsiod thread is hung on a vnode lock:
> 
> (kgdb) proc 36927
> [Switching to thread 150 (Thread 100679)]#0 sched_switch (
> td=0xffffff0320de88c0, newtd=0xffffff0003521460, flags=Variable
> "flags" is
> not available.
> )
> at /usr/src/sys/kern/sched_ule.c:1898
> 1898 cpuid = PCPU_GET(cpuid);
> (kgdb) where
> #0 sched_switch (td=0xffffff0320de88c0, newtd=0xffffff0003521460,
> flags=Variable "flags" is not available.
> )
> at /usr/src/sys/kern/sched_ule.c:1898
> #1 0xffffffff80407953 in mi_switch (flags=260, newtd=0x0)
> at /usr/src/sys/kern/kern_synch.c:449
> #2 0xffffffff8043e342 in sleepq_wait (wchan=0xffffff0358bbb7f8,
> pri=96)
> at /usr/src/sys/kern/subr_sleepqueue.c:629
> #3 0xffffffff803e5755 in __lockmgr_args (lk=0xffffff0358bbb7f8,
> flags=524544,
> ilk=0xffffff0358bbb820, wmesg=Variable "wmesg" is not available.
> ) at /usr/src/sys/kern/kern_lock.c:220
> #4 0xffffffff80489219 in vop_stdlock (ap=Variable "ap" is not
> available.
> ) at lockmgr.h:94
> #5 0xffffffff80697322 in VOP_LOCK1_APV (vop=0xffffffff80892b00,
> a=0xffffff847ac10600) at vnode_if.c:1988
> #6 0xffffffff804a8bb7 in _vn_lock (vp=0xffffff0358bbb760,
> flags=524288,
> file=0xffffffff806fa421 "/usr/src/sys/kern/vfs_subr.c", line=2138)
> at vnode_if.h:859
> #7 0xffffffff8049b680 in vget (vp=0xffffff0358bbb760, flags=524544,
> td=0xffffff0320de88c0) at /usr/src/sys/kern/vfs_subr.c:2138
> #8 0xffffffff8048d4aa in vfs_hash_get (mp=0xffffff004a3a0000,
> hash=227722108,
> flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/vfs_hash.c:81
> #9 0xffffffff805631f6 in nfs_nget (mntp=0xffffff004a3a0000,
> fhp=0xffffff03771eed56, fhsize=32, npp=0xffffff847ac10a40,
> flags=524288)
> at /usr/src/sys/nfsclient/nfs_node.c:120
> #10 0xffffffff80570229 in nfs_readdirplusrpc (vp=0xffffff0179902760,
> uiop=0xffffff847ac10ad0, cred=0xffffff005587c300)
> at /usr/src/sys/nfsclient/nfs_vnops.c:2636
> ---Type <return> to continue, or q <return> to quit---
> #11 0xffffffff8055f144 in nfs_doio (vp=0xffffff0179902760,
> bp=0xffffff83e05c5860, cr=0xffffff005587c300, td=Variable "td" is not
> available.
> )
> at /usr/src/sys/nfsclient/nfs_bio.c:1600
> #12 0xffffffff8056770a in nfssvc_iod (instance=Variable "instance" is
> not
> available.
> )
> at /usr/src/sys/nfsclient/nfs_nfsiod.c:303
> #13 0xffffffff803d0c2f in fork_exit (callout=0xffffffff805674b0
> <nfssvc_iod>,
> arg=0xffffffff809266e0, frame=0xffffff847ac10c40)
> at /usr/src/sys/kern/kern_fork.c:861
> 
> Thread stuck in getblk for that vnode (holds shared lock on this
> vnode):
> 
> (kgdb) proc 36902
> [Switching to thread 149 (Thread 101543)]#0 sched_switch (
> td=0xffffff0378d8a8c0, newtd=0xffffff0003521460, flags=Variable
> "flags" is
> not available.
> )
> at /usr/src/sys/kern/sched_ule.c:1898
> 1898 cpuid = PCPU_GET(cpuid);
> (kgdb) where
> #0 sched_switch (td=0xffffff0378d8a8c0, newtd=0xffffff0003521460,
> flags=Variable "flags" is not available.
> )
> at /usr/src/sys/kern/sched_ule.c:1898
> #1 0xffffffff80407953 in mi_switch (flags=260, newtd=0x0)
> at /usr/src/sys/kern/kern_synch.c:449
> #2 0xffffffff8043e342 in sleepq_wait (wchan=0xffffff83e11bc1c0,
> pri=96)
> at /usr/src/sys/kern/subr_sleepqueue.c:629
> #3 0xffffffff803e5755 in __lockmgr_args (lk=0xffffff83e11bc1c0,
> flags=530688,
> ilk=0xffffff0358bbb878, wmesg=Variable "wmesg" is not available.
> ) at /usr/src/sys/kern/kern_lock.c:220
> #4 0xffffffff80483aeb in getblk (vp=0xffffff0358bbb760, blkno=3,
> size=32768,
> slpflag=0, slptimeo=0, flags=0) at lockmgr.h:94
> #5 0xffffffff8055e963 in nfs_getcacheblk (vp=0xffffff0358bbb760, bn=3,
> size=32768, td=0xffffff0378d8a8c0) at
> /usr/src/sys/nfsclient/nfs_bio.c:1259
> #6 0xffffffff805627d9 in nfs_bioread (vp=0xffffff0358bbb760,
> uio=0xffffff847bcf0ad0, ioflag=Variable "ioflag" is not available.
> ) at /usr/src/sys/nfsclient/nfs_bio.c:530
> #7 0xffffffff806956a4 in VOP_READ_APV (vop=0xffffffff808a29a0,
> a=0xffffff847bcf09c0) at vnode_if.c:887
> #8 0xffffffff804a9e27 in vn_read (fp=0xffffff03b6a506e0,
> uio=0xffffff847bcf0ad0, active_cred=Variable "active_cred" is not
> available.
> ) at vnode_if.h:384
> #9 0xffffffff80444fb1 in dofileread (td=0xffffff0378d8a8c0, fd=3,
> fp=0xffffff03b6a506e0, auio=0xffffff847bcf0ad0, offset=Variable
> "offset"
> is not available.
> ) at file.h:242
> 
> The buffer is locked by LK_KERNPROC:
> 
> (kgdb) bprint bp
> 0xffffff83e11bc128: BIO_READ flags (ASYNC|VMIO)
> error = 0, bufsize = 32768, bcount = 32768, b_resid = 0
> bufobj = 0xffffff0358bbb878, data = 0xffffff8435a05000, blkno = d, dep
> = 0x0
> lock type bufwait: EXCL by LK_KERNPROC with exclusive waiters pending
> 
> And this buffer is queued as the first pending buffer on the mount
> waiting
> for service by nfsiod:
> 
> (kgdb) set $nmp = (struct nfsmount *)vp->v_mount->mnt_data
> (kgdb) p $nmp->nm_bufq.tqh_first
> $24 = (struct buf *) 0xffffff83e11bc128
> (kgdb) p bp
> $25 = (struct buf *) 0xffffff83e11bc128
> 
> So, the first process is waiting for a block from an NFS directory.
> That
> block is in queue to be completed as an async I/O by the nfsiod thread
> pool.
> However, the lone nfsiod thread in the pool is waiting to exclusively
> lock the
> original NFS directory to update its attributes, so it cannot service
> the
> async I/O request.
> 
> --
> John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?492562517.3880600.1363217615412.JavaMail.root>