From owner-freebsd-current@FreeBSD.ORG Sun Aug 21 04:17:25 2005 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5850716A420; Sun, 21 Aug 2005 04:17:25 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 03E1143D46; Sun, 21 Aug 2005 04:17:24 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id j7L4HH4W028876; Sat, 20 Aug 2005 21:17:21 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200508210417.j7L4HH4W028876@gw.catspoiler.org> Date: Sat, 20 Aug 2005 21:17:17 -0700 (PDT) From: Don Lewis To: jeff@FreeBSD.org, current@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: Subject: vlrureclaim() race condition patch review request X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2005 04:17:25 -0000 vfs_subr.c 1.636 removed LK_NOWAIT from a VOP_LOCK() call in vlrureclaim(), which worsed an existing race condition in this code. This allowed the VI_DOOMED flag to be set by another thread after this thread had checked VI_DOOMED and was waiting in VOP_LOCKED(), allowing the mutex double unlock panic to be triggered as a side effect. The mutex double unlock was fixed in vfs_subr.c 1.639, but the race condition itself was not fixed. According to the commit log, LK_NOWAIT was removed so that this thread might occasionally pause and allow other threads to run. Even with the removal of LK_NOWAIT, there is no guarantee that other threads won't be starved, because there may be an arbitrarily large number of vnodes that are processed before calling VOP_LOCK(), and there is no guarantee that any of the VOP_LOCK() calls will block and allow other threads to proceed. Conversely, this thread might block for an arbitrarily long period of time in VOP_LOCK(), which would interfere with vnode recycling. A more deterministic solution is to explicitly yield to other threads on a periodic basis. There was still a small race condition in the original code. Even if VOP_LOCK() does not block because it is called with LK_NOWAIT, VOP_LOCK() still drops the vnode interlock, possibly allowing another thread to invalidate the vnode status after it was previously checked. At present, the only fix for this to relock the vnode interlock and recheck the vnode status after the VOP_LOCK() call. I suspect that it would be safe to hold the vnode interlock across the call VOP_LOCK() call if VOP_LOCK() was called with the LK_NOWAIT flag but not the LK_INTERLOCK flag, since there should be no danger of deadlock. This would avoid the need to relock the vnode interlock and repeat the check of its state, but the vnode lock assertions complain about this. I'd appreciate a review of this patch. I circulated an earlier version of this patch, but a potential vnode lock leak was missed, and I had also not done anything to compensate for the re-addition of LK_NOWAIT. I'm not especially happy with all the goto nonsense at the end of the loop. It is an attempt to optimize out extra mutex operations without duplicating a lot of code. If anyone has suggestions on a better way to write this ... Index: sys/kern/vfs_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.640 diff -u -r1.640 vfs_subr.c --- sys/kern/vfs_subr.c 13 Aug 2005 20:07:50 -0000 1.640 +++ sys/kern/vfs_subr.c 21 Aug 2005 01:31:32 -0000 @@ -570,29 +570,59 @@ TAILQ_INSERT_TAIL(&mp->mnt_nvnodelist, vp, v_nmntvnodes); --count; if (!VI_TRYLOCK(vp)) - continue; + goto next_iter; /* * If it's been deconstructed already, it's still * referenced, or it exceeds the trigger, skip it. */ - if ((vp->v_iflag & VI_DOOMED) != 0 || vp->v_usecount || - !LIST_EMPTY(&(vp)->v_cache_src) || (vp->v_object != NULL && + if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) || + (vp->v_iflag & VI_DOOMED) != 0 || (vp->v_object != NULL && vp->v_object->resident_page_count > trigger)) { VI_UNLOCK(vp); - continue; + goto next_iter; } MNT_IUNLOCK(mp); vholdl(vp); - if (VOP_LOCK(vp, LK_INTERLOCK|LK_EXCLUSIVE, td)) { + if (VOP_LOCK(vp, LK_INTERLOCK|LK_EXCLUSIVE|LK_NOWAIT, td)) { vdrop(vp); - MNT_ILOCK(mp); - continue; + goto next_iter_mntunlocked; } VI_LOCK(vp); + /* + * v_usecount may have been bumped after VOP_LOCK() dropped + * the vnode interlock and before it was locked again. + * + * It is not necessary to recheck VI_DOOMED because it can + * only be set by another thread that holds both the vnode + * lock and vnode interlock. If another thread has the + * vnode lock before we get to VOP_LOCK() and obtains the + * vnode interlock after VOP_LOCK() drops the vnode + * interlock, the other thread will be unable to drop the + * vnode lock before our VOP_LOCK() call fails. + */ + if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) || + (vp->v_object != NULL && + vp->v_object->resident_page_count > trigger)) { + VOP_UNLOCK(vp, LK_INTERLOCK, td); + goto next_iter_mntunlocked; + } + KASSERT((vp->v_iflag & VI_DOOMED) == 0, + ("VI_DOOMED unexpectedly detected in vlrureclaim()")); vgonel(vp); VOP_UNLOCK(vp, 0, td); vdropl(vp); done++; +next_iter_mntunlocked: + if ((count % 256) != 0) + goto relock_mnt; + goto yield; +next_iter: + if ((count % 256) != 0) + continue; + MNT_IUNLOCK(mp); +yield: + uio_yield(); +relock_mnt: MNT_ILOCK(mp); } MNT_IUNLOCK(mp);