From owner-freebsd-current@FreeBSD.ORG Wed May 14 19:22:05 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B82F37B401; Wed, 14 May 2003 19:22:05 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7220F43F75; Wed, 14 May 2003 19:22:04 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h4F2LgM7054256; Wed, 14 May 2003 19:21:46 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200305150221.h4F2LgM7054256@gw.catspoiler.org> Date: Wed, 14 May 2003 19:21:42 -0700 (PDT) From: Don Lewis To: robert@fledge.watson.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: bsder@allcaps.org cc: alfred@FreeBSD.org cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 May 2003 02:22:05 -0000 On 14 May, Robert Watson wrote: > > On Tue, 13 May 2003, Don Lewis wrote: >> I don't know if the the client will retry in the blocking case or if the >> server side will have to grow the code to poll any local locks that it > > might encounter. > > > Based on earlier experience with the wakeups getting "lost", it sounds > like the re-polling takes place once every ten seconds on the client for > blocking locks. That seems makes sense. It looks like the client side more or less just tosses the "blocked" response and waits for the grant message to arrive. I guess it periodically polls while it waits. > Speaking of re-polling, here's another bug: Open two pty's on the NFS > client. On pty1, grab and hold an exclusive lock on a file; sleep. On > pty2, do a blocking lock attempt on open, but Ctrl-C the process before > the pty1 process wakes up, meaning that the lock attempt is effectively > aborted. Now kill the first process, releasing the lock, and attempt to > grab the lock on the file: you'll hang forever. The client rpc.lockd has > left a blocking lock request registered with the server, but never > released that lock for the now missing process. > It looks like rpc.statd on the client needs to remember that it requested > the lock, and when it discovers that the process requesting the lock has > evaporated, it should immediately release the lock on its behalf. It's > not clear to me how that should be accomplished: perhaps when it tries to > wake up the process and discovers it is missing, it should do it, or if > the lock attempt is aborted early due to a signal, a further message > should be sent from the kernel to the userland rpc.lockd to notify it that > the lock instance is no longer of interest. Note that if we're only using > the pid to identify a process, not a pid and some sort of generation > number, there's the potential for pid reuse and a resulting race. I saw something in the code about a cancel message (nlm4_cancel, nlm4_cancel_msg). I think what is supposed to happen is that when process #2 is killed the descriptor waiting for the lock will closed which should get rid of its lock request. rpc.lockd on the client should notice this and send a cancel message to the server. When process #1 releases the lock, the second lock will no longer be queued on the the server and process #3 should be able to grab the lock. This bug could be in the client rpc.lockd, the client kernel, or the server rpc.lockd.