From owner-freebsd-current@FreeBSD.ORG Tue May 13 01:25:59 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 761A937B401; Tue, 13 May 2003 01:25:59 -0700 (PDT) Received: from mail.allcaps.org (allcaps.org [216.240.173.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id D8AE743F93; Tue, 13 May 2003 01:25:58 -0700 (PDT) (envelope-from bsder@allcaps.org) Received: from mail.allcaps.org (localhost [127.0.0.1]) by mail.allcaps.org (Postfix) with ESMTP id 6807792FAF; Tue, 13 May 2003 04:28:58 -0400 (EDT) Received: from localhost (bsder@localhost)h4D8SwpQ031472; Tue, 13 May 2003 01:28:58 -0700 X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs Date: Tue, 13 May 2003 01:28:58 -0700 (PDT) From: "Andrew P. Lentvorski, Jr." To: Robert Watson In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Don Lewis cc: alfred@FreeBSD.org cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 May 2003 08:25:59 -0000 On Mon, 12 May 2003, Robert Watson wrote: > (3) Sometimes rpc.lockd on 5.x acting as a server gets really confused > when you mix local and remote locks. I haven't quite figured out the > circumstances, but occasionally I run into a situation where a client > contends against an existing lock on the server, and the client never > receives a notification from the server that the lock has been > released. It looks like the server stores state that the lock is > contended, but perhaps never properly re-polls the kernel to see if > the lock has been locally re-released: I just looked at the code again. rpc.lockd does not spawn off extra processes to continuously poll the kernel. It assumes that it has control of the underlying file and only rechecks the blockedlocklist when it receives and grants an NFS file unlock. Consequently, contention on the hardware needs to actually cause a *fail* and not queue up a lock for later. Currently, it returns a fail but still executes add_blockingfilelock. The offending code in lockd_lock.c is: if (retval == PFL_NFSDENIED || retval == PFL_HWDENIED) { /* Once last chance to check the lock */ if (fl->blocking == 1) { /* Queue the lock */ debuglog("BLOCKING LOCK RECEIVED\n"); retval = (retval == PFL_NFSDENIED ? PFL_NFSBLOCKED : PFL_HWBLOCKED); add_blockingfilelock(fl); dump_filelock(fl); } else { A possible fix should be: if (fl->blocking == 1) { if (retval == PFL_NFSDENIED) { /* Queue the lock */ debuglog("BLOCKING LOCK RECEIVED\n"); retval = PFL_NFSBLOCKED; add_blockingfilelock(fl); dump_filelock(fl); } else { /* retval is okay as PFL_HWDENIED */ debuglog("BLOCKING LOCK DENIED IN HARDWARE\n"); dump_filelock(fl); } } else { This should cause the server to return nlm4_denied and the client should eventually retry the lock rather than waiting on the server. CAUTION! I haven't checked or compiled this code. If folks need me to, I can, but it will be a couple of days as I don't have two machines handy that I can install -CURRENT on and set up NFS. -a