Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Jun 2009 13:25:07 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        Yuri Pankov <yuri.pankov@gmail.com>, freebsd-current@freebsd.org, Paul Saab <ps@freebsd.org>, Matthew Fleming <matthew.fleming@isilon.com>
Subject:   Re: panic: knlist not locked, but should be
Message-ID:  <200906091325.08293.jhb@freebsd.org>
In-Reply-To: <20090609170025.GE75569@deviant.kiev.zoral.com.ua>
References:  <20090609163005.GD75569@deviant.kiev.zoral.com.ua> <06D5F9F6F655AD4C92E28B662F7F853E02CC8A29@seaxch09.desktop.isilon.com> <20090609170025.GE75569@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 09 June 2009 1:00:25 pm Kostik Belousov wrote:
> On Tue, Jun 09, 2009 at 09:45:49AM -0700, Matthew Fleming wrote:
> > 
> > > This appears to be an interaction with the recent changes to use 
> > > shared vnode locks for writes on ZFS.  Hmm, I think it may be ok to 
> > > use a shared vnode lock for kevents on vnodes though.  The vnode 
> > > interlock should be sufficient locking for what little work the kevent
> > 
> > > filters do.  As a quick hack for now the MNT_SHARED_WRITES() stuff 
> > > could avoid using shared locks 'if (!VN_KNLIST_EMPTY(vp))', but I 
> > > think the longer term fix is to not use the vnode locks for vnode
> > kevents, but use the interlock instead.
> > 
> > I tried (briefly) using the interlock since Isilon's vnode lock is
> > cluster wide (in our 6.1 based code we got away with using Giant).  This
> > got me a LOR report on the interlock:
> > 
> > 	/*
> > 	 * kqueue/VFS interaction
> > 	 */
> > 	{ "kqueue", &lock_class_mtx_sleep },
> > 	{ "struct mount mtx", &lock_class_mtx_sleep },
> > 	{ "vnode interlock", &lock_class_mtx_sleep },
> > 	{ NULL, NULL },
> > 
> > since knote() will take first the list->kl_lock and then the kqueue
> > lock.  I didn't spend any time on it, and switched to using the vnode
> > v_lock for my purposes.  But someone added that lock ordering (r166421)
> > for a reason.
> 
> That was me, I actually looked for the reversed order that was reported
> several times on the list in 6.1-6.2 timeframe. Unfortunately, nothing
> was found.
> 
> I noted in the separate letter that read filter for vnodes needs
> shared vnode lock anyway.

So my current idea is to allow shared vnode locks and use the vnode interlock 
in the filt_vfs_* routines to protect access to the kn_* member variables.

--- //depot/projects/smpng/sys/kern/vfs_subr.c	2009/06/09 15:15:22
+++ //depot/user/jhb/lock/kern/vfs_subr.c	2009/06/09 17:20:33
@@ -4103,8 +4103,10 @@
 vfs_knllocked(void *arg)
 {
 	struct vnode *vp = arg;
+	int islocked;
 
-	return (VOP_ISLOCKED(vp) == LK_EXCLUSIVE);
+	islocked = VOP_ISLOCKED(vp);
+	return (islocked == LK_SHARED || islocked == LK_EXCLUSIVE);
 }
 
 int
@@ -4114,6 +4116,7 @@
 	struct knote *kn = ap->a_kn;
 	struct knlist *knl;
 
+	ASSERT_VOP_ELOCKED(vp, "vfs_kqfilter");
 	switch (kn->kn_filter) {
 	case EVFILT_READ:
 		kn->kn_fop = &vfsread_filtops;
@@ -4147,6 +4150,7 @@
 {
 	struct vnode *vp = (struct vnode *)kn->kn_hook;
 
+	ASSERT_VOP_ELOCKED(vp, "filt_vfsdetach");
 	KASSERT(vp->v_pollinfo != NULL, ("Missing v_pollinfo"));
 	knlist_remove(&vp->v_pollinfo->vpi_selinfo.si_note, kn, 0);
 }
@@ -4157,48 +4161,65 @@
 {
 	struct vnode *vp = (struct vnode *)kn->kn_hook;
 	struct vattr va;
+	int retval;
 
 	/*
 	 * filesystem is gone, so set the EOF flag and schedule
 	 * the knote for deletion.
 	 */
 	if (hint == NOTE_REVOKE) {
+		VI_LOCK(vp);
 		kn->kn_flags |= (EV_EOF | EV_ONESHOT);
+		VI_UNLOCK(vp);
 		return (1);
 	}
 
 	if (VOP_GETATTR(vp, &va, curthread->td_ucred))
 		return (0);
 
+	VI_LOCK(vp);
 	kn->kn_data = va.va_size - kn->kn_fp->f_offset;
-	return (kn->kn_data != 0);
+	retval = (kn->kn_data != 0);
+	VI_UNLOCK(vp);
+	return (retval);
 }
 
 /*ARGSUSED*/
 static int
 filt_vfswrite(struct knote *kn, long hint)
 {
+	struct vnode *vp = (struct vnode *)kn->kn_hook;
+
 	/*
 	 * filesystem is gone, so set the EOF flag and schedule
 	 * the knote for deletion.
 	 */
+	VI_LOCK(vp);
 	if (hint == NOTE_REVOKE)
 		kn->kn_flags |= (EV_EOF | EV_ONESHOT);
 
 	kn->kn_data = 0;
+	VI_UNLOCK(vp);
 	return (1);
 }
 
 static int
 filt_vfsvnode(struct knote *kn, long hint)
 {
+	struct vnode *vp = (struct vnode *)kn->kn_hook;
+	int retval;
+
+	VI_LOCK(vp);
 	if (kn->kn_sfflags & hint)
 		kn->kn_fflags |= hint;
 	if (hint == NOTE_REVOKE) {
 		kn->kn_flags |= EV_EOF;
+		VI_UNLOCK(vp);
 		return (1);
 	}
-	return (kn->kn_fflags != 0);
+	retval = (kn->kn_fflags != 0);
+	VI_UNLOCK(vp);
+	return (retval);
 }
 
 int


-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200906091325.08293.jhb>