From owner-freebsd-hackers Tue Jul 10 23:43:35 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id ECE1C37B403; Tue, 10 Jul 2001 23:43:31 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f6B6hTB24707; Tue, 10 Jul 2001 23:43:29 -0700 (PDT) (envelope-from dillon) Date: Tue, 10 Jul 2001 23:43:29 -0700 (PDT) From: Matt Dillon Message-Id: <200107110643.f6B6hTB24707@earth.backplane.com> To: Thomas Moestl Cc: hackers@FreeBSD.ORG, Boris Popov Subject: more on Re: Please review: bugfix for vinvalbuf() References: <20010711003926.B8799@crow.dom2ip.de> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Hi, : :I've just tripped over an obviously long-standing (since about :Jan. 1998) bug in vinvalbuf while looking into PR kern/26224. The :problematic code looks like (on -CURRENT): : : /* : * Destroy the copy in the VM cache, too. : */ : mtx_lock(&vp->v_interlock); : if (VOP_GETVOBJECT(vp, &object) == 0) { : vm_object_page_remove(object, 0, 0, : (flags & V_SAVE) ? TRUE : FALSE); : } : mtx_unlock(&vp->v_interlock); : :The locks seem to be needed for file systems that don't perform real :locking (on -STABLE, they are simplelocks). :This, however, is incorrect because vm_object_page_remove may sleep. :I've attached a patch that passes the interlock to :vm_object_page_remove, which in turn passes it to a modified version :of vm_page_sleep, which unlocks it around the sleep. :I think that this is correct, because the object should be in a valid :state when we sleep (and all checks are reexecuted in that case). : :Since I'm not very experienced with vfs and vm stuff, I'd be glad if :this patch could get some review. In particular, is the lock/unlock :pair really still needed, and are there notable differeces in -STABLE :(because the fix would need the be MFC'ed)? : :Thanks, : - thomas Ok, I've looked at this more. What is supposed to happen is that the 'while (vp->v_numoutput) ...' code just above the section you cited is supposed to prevent the deadlock. The while code looks like this: while (vp->v_numoutput > 0) { vp->v_flag |= VBWAIT; tsleep(&vp->v_numoutput, PVM, "vnvlbv", 0); } However, as Ian points out in his followup, it doesn't work: :... :I've seen a related deadlock here in 4.x with NFS; vm_fault locks :a page and then calls vput which aquires the v_interlock. This code :in vinvalbuf locks the v_interlock first, and then vm_object_page_remove() :locks the page. That's a simple lock-order reversal deadlock which I :guess would still exist even with this patch. It doesn't work for the simple reason that vm_fault isn't busying the page for writing, it's busying it for reading, so v_numoutput will be 0. I think the best solution is to change vinvalbuf's wait loop to wait on vm_object->paging_in_progress instead of vp->v_numoutput, or perhaps to wait for both conditions to be satisfied. vm_object->paging_in_progress applies to reads and writes, while vp->v_numoutput only applies to writes. I know this isn't the most correct solution... there is still the lock reversal if vm_object_remove_pages() were ever to sleep. The idea is that it won't sleep if there is no paging in progress because nobody will have any of the object's pages locked. I think it is the best we can do for the moment. There are several places in vm/vm_object.c where the same assumption is made (testing against vm_object->paging_in_progress, though, which works, not vp->v_numoutput). - Thomas, if you are interested this could be a little project for you. See if you can code-up another while() loop to also wait for the object's paging_in_progress count to become 0 in the vinvalbuf() code. Look in vm/vm_object.c for examples of how to construct such a loop. I will be happy to review and test whatever you come up with and I'm sure Ian will too! -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message