From owner-freebsd-current@FreeBSD.ORG Thu May 15 00:45:49 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8339837B401; Thu, 15 May 2003 00:45:49 -0700 (PDT) Received: from mail.allcaps.org (allcaps.org [216.240.173.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id DE4D743F93; Thu, 15 May 2003 00:45:48 -0700 (PDT) (envelope-from bsder@allcaps.org) Received: from mail.allcaps.org (localhost [127.0.0.1]) by mail.allcaps.org (Postfix) with ESMTP id 8934792FB0; Thu, 15 May 2003 03:49:24 -0400 (EDT) Received: from localhost (bsder@localhost)h4F7nOuk010393; Thu, 15 May 2003 00:49:24 -0700 X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs Date: Thu, 15 May 2003 00:49:24 -0700 (PDT) From: "Andrew P. Lentvorski, Jr." To: Robert Watson In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Don Lewis cc: alfred@FreeBSD.org cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 May 2003 07:45:49 -0000 On Wed, 14 May 2003, Robert Watson wrote: > Speaking of re-polling, here's another bug: Open two pty's on the NFS > client. On pty1, grab and hold an exclusive lock on a file; sleep. On > pty2, do a blocking lock attempt on open, but Ctrl-C the process before > the pty1 process wakes up, meaning that the lock attempt is effectively > aborted. Now kill the first process, releasing the lock, and attempt to > grab the lock on the file: you'll hang forever. The client rpc.lockd has > left a blocking lock request registered with the server, but never > released that lock for the now missing process. > > Example pty1: > > crash1:/tmp> ./locktest nocreate openexlock nonblock noflock test 10 > 1107 open(test, 36, 0666) Wed May 14 17:28:41 2003 > 1107 open() returns Wed May 14 17:28:41 2003 > 1107 sleep(10) Wed May 14 17:28:41 2003 > 1107 sleep() returns Wed May 14 17:28:51 2003 > > Example pty2: > crash1:/tmp> ./locktest nocreate openexlock block noflock test 0 > 1108 open(test, 32, 0666) Wed May 14 17:28:43 2003 > ^C > > crash1:/tmp> ./locktest nocreate openexlock block noflock test 0 > 1113 open(test, 32, 0666) Wed May 14 17:30:52 2003 > > > It looks like rpc.statd on the client needs to remember that it requested > the lock, That's not the purpose of rpc.statd. rpc.statd is only recording locks for server/client crash recovery. It should not get involved in cancel message problems. NFS is supposed to be stateless. Any state required for locking is solely the responsibility of rpc.lockd. Putting the state rpc.statd just pushes the problem around without getting rid of it. In fact, as currently written, I'm pretty sure that rpc.statd does not work correctly anyway. > ... It's not clear to me how that should be accomplished: perhaps when > it tries to wake up the process and discovers it is missing, it should > do it, or if the lock attempt is aborted early due to a signal, a > further message should be sent from the kernel to the userland rpc.lockd > to notify it that the lock instance is no longer of interest. Note that > if we're only using the pid to identify a process, not a pid and some > sort of generation number, there's the potential for pid reuse and a > resulting race. One solution would be for the client kernel to maintain all locks (UFS, NFS, SMB, whatever) in one area/data structure and then delegate the appropriate specific actions as the signals come on. Another alternative is that rpc.lockd must register a kevent for every process which requests a lock in NFS so that it gets notified if the process gets terminated. I have no idea which would be the better/easier solution. freebsd-fs has been notably silent on this issue. -a