From owner-freebsd-hackers  Fri Apr  4 07:43:26 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id HAA14524
          for hackers-outgoing; Fri, 4 Apr 1997 07:43:26 -0800 (PST)
Received: from nlsystems.com (nlsys.demon.co.uk [158.152.125.33])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id HAA14516
          for <freebsd-hackers@FreeBSD.ORG>; Fri, 4 Apr 1997 07:43:20 -0800 (PST)
Received: from herring.nlsystems.com (herring.nlsystems.com [10.0.0.2])
	by nlsystems.com (8.8.5/8.8.5) with SMTP id QAA11191;
	Fri, 4 Apr 1997 16:43:08 +0100 (BST)
Date: Fri, 4 Apr 1997 16:43:08 +0100 (BST)
From: Doug Rabson <dfr@nlsystems.com>
To: Tor Egge <Tor.Egge@idi.ntnu.no>
cc: dg@root.com, ponds!rivers@dg-rtp.dg.com, freebsd-hackers@FreeBSD.ORG
Subject: Re: kern/3184: vnodes are used after they are freed. (dup alloc?)
In-Reply-To: <199704041503.RAA05693@pat.idt.unit.no>
Message-ID: <Pine.BSF.3.95q.970404163654.8538B-100000@herring.nlsystems.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Fri, 4 Apr 1997, Tor Egge wrote:

> >    Uh, this is wrong since VOP_INACTIVE really wants a '0' usecount vnode,
> > and there are assumptions throughout the code that a '0' usecount also
> > implies that the vnode is on the free list. A quick code review of Tor's
> > suggested fix shows that it will fail in several places in the kernel and
> > basically needs to be re-thought...which is why it hasn't been committed
> > yet.
> 
> I'm running with the modified suggested fix now, and have not seen any
> failures due to that suggested fix. The original suggested fix failed
> due to the assumptions that a `0' usecount meant that it was on the
> free list, and a NULL pointer was dereferenced when trying to move the
> vnode to the head of the free list. Adding a kludge (magic number
> 0xdeadb, used elsewhere in the code to mark that the vnode was not on
> the freelist) made the code work for my tests.

I tried testing your fix this morning and the 0xdeadb stuff just caused
vget to fault a couple of minutes into my test (simultaneous rm -rf
largetree and cvs co src, both remote).

This problem really has little to do with nfs_inactive.  What is
happening is a race between vgone and vget which would normally be solved
by the vnode locks.  Since NFS doesn't have vnode locks, the race happens.

I am most of the way there in implementing the right solution for NFS
which is to used shared locks for NFS;  vgone can then use the lock
manager to wait for all the shared locks to drain before recycling the
vnode.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 951 1891