From owner-freebsd-current@FreeBSD.ORG  Thu May 15 06:37:22 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7E79837B401; Thu, 15 May 2003 06:37:22 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7C65A43F85; Thu, 15 May 2003 06:37:21 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.9/8.12.9) with ESMTP id h4FDb7On020466;
	Thu, 15 May 2003 09:37:07 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)h4FDb6Af020463;
	Thu, 15 May 2003 09:37:06 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Thu, 15 May 2003 09:37:06 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org>
In-Reply-To: <Pine.LNX.4.44.0305150029480.10302-100000@mail.allcaps.org>
Message-ID: <Pine.NEB.3.96L.1030515085703.19892C-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Don Lewis <truckman@FreeBSD.org>
cc: alfred@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: rpc.lockd spinning; much breakage
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 May 2003 13:37:22 -0000


On Thu, 15 May 2003, Andrew P. Lentvorski, Jr. wrote:


> > It looks like rpc.statd on the client needs to remember that it requested
> > the lock,
> 
> That's not the purpose of rpc.statd.  rpc.statd is only recording locks
> for server/client crash recovery.  It should not get involved in cancel
> message problems.  NFS is supposed to be stateless.  Any state required
> for locking is solely the responsibility of rpc.lockd.  Putting the
> state rpc.statd just pushes the problem around without getting rid of
> it. 

Er, yes.  That's what I meant to write.  rpc.lockd.  :-)

> In fact, as currently written, I'm pretty sure that rpc.statd does not
> work correctly anyway.

I'm still making my way through the XNFS spec, and will take a look at
rpc.statd when I'm done.  rpc.lockd is becoming a lot more clear to me
with more deep reading of the spec :-).

> > ...  It's not clear to me how that should be accomplished: perhaps when
> > it tries to wake up the process and discovers it is missing, it should
> > do it, or if the lock attempt is aborted early due to a signal, a
> > further message should be sent from the kernel to the userland rpc.lockd
> > to notify it that the lock instance is no longer of interest.  Note that
> > if we're only using the pid to identify a process, not a pid and some
> > sort of generation number, there's the potential for pid reuse and a
> > resulting race.
> 
> One solution would be for the client kernel to maintain all locks (UFS,
> NFS, SMB, whatever) in one area/data structure and then delegate the
> appropriate specific actions as the signals come on. 
> 
> Another alternative is that rpc.lockd must register a kevent for every
> process which requests a lock in NFS so that it gets notified if the
> process gets terminated. 
> 
> I have no idea which would be the better/easier solution.  freebsd-fs
> has been notably silent on this issue.

I suspect the "easier" solution is to continue to work with rpc.lockd in
its current structure and adapt it to be aware of the additional events of
interest.  I believe, incidentally, that the open() can be interrupted by
a signal when grabbing the exclusive lock and keep running, in which case
the locking attempt is aborted, but the process hasn't died.  So even a
kevent indicating the death of the process isn't sufficient.  I need to
test this assumption, but it strikes me as pretty likely. 

One issue that concerns me is races -- registering for a kevent after
getting a request assumes that you can get it registered before the
process dies.  In SMP systems, this introduces a natural race.  I think it
sounds like there are two parts to a solution:

(1) The kernel notifies rpc.lockd when a process aborts a lock attempt,
    which permits rpc.lockd to handle one of two cases:

	(a) The abort arrived before the lock was granted by the server,
	    in which case we need to abort or release the distributed lock
	    attempt -- I don't know, but am guessing, that these are the same
	    operation due to the potential for "crossing in flight" with a
	    response.
	(b) The abort arrived after the lock was granted by the server but
	    before the kernel was notified of the grant; in which case we
	    release the lock.

I think it would also be useful to handle:

(2) When no process is available to accept a lock response, that lock
    should be immediately released.

I'm still getting a grasp on the details of rpc.lockd, so I'm not to clear
on how much state is carried around internally.  In terms of the kernel
side, it would not be hard to add an additional case to the error handling
for tsleep() to pick up EINTR, in which case we stuff another message into
the fifo reporting an abort by the process.  In fact, we might be able to
reuse the "unlock request" by simply writing the same locking message back
into the fifo with F_UNLCK, and just make sure rpc.lockd knows that if it
gets an unlock while the lock event is in process, it should do the right
thing. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories