From owner-freebsd-current@FreeBSD.ORG Tue May 13 14:11:59 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 95F8337B401; Tue, 13 May 2003 14:11:59 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 02EF743F85; Tue, 13 May 2003 14:11:57 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h4DLBaM7051295; Tue, 13 May 2003 14:11:40 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200305132111.h4DLBaM7051295@gw.catspoiler.org> Date: Tue, 13 May 2003 14:11:36 -0700 (PDT) From: Don Lewis To: rwatson@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: bsder@allcaps.org cc: alfred@FreeBSD.org cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 May 2003 21:11:59 -0000 On 13 May, Robert Watson wrote: > > On Tue, 13 May 2003, Robert Watson wrote: > >> So the client isn't retrying, or mapping errors right after this patch, >> but the failure modes are more consistent and I seem not to be getting >> any interminable hangs anymore on the client. > > I should clarify this statement: I no longer get the odd hangs when it > comes to client and server interactions when contending a lock established > on the server and now tested by the client. I still bump into the "client > isn't woken up in a timely manner after a lock is released by the same or > another client". Here's the demonstration case with a bit more detail > from what I presented earlier. The server runs on host cboss, the client > runs twice on host crash1 on different pty's. In this scenario, each > client attempts to grab an exclusive lock, potentially blocking, and then > sleep for 10 seconds (this is with one of the earlier posted patches): Try adding the lock_answer() calls I suggested in an earlier message ... > crash1:/tmp> ./locktest nocreate openlock block noflock test 10 > 933 open(test, 32, 0666) Tue May 13 14:31:31 2003 > 933 open() returns Tue May 13 14:31:31 2003 > 933 sleep(10) Tue May 13 14:31:31 2003 > 933 sleep() returns Tue May 13 14:31:41 2003 > > crash1:/tmp> ./locktest nocreate openlock block noflock test 0 > 934 open(test, 32, 0666) Tue May 13 14:31:33 2003 > 934 open() returns Tue May 13 14:31:53 2003 > > rpc.lockd results on crash1: > > May 13 14:31:31 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 > May 13 14:31:33 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 > May 13 14:31:42 crash1 rpc.lockd: nlm_granted_msg from 192.168.50.1 > May 13 14:31:42 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1 > May 13 14:31:42 crash1 rpc.lockd: process 933: No such process > May 13 14:31:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 > > In this example, pid 934 requests the lock on the object at 14:31:33 -- > pid 933 released that lock at 14:31:41, but the pid 934 isn't notified > until 14:31:53. It looks like it should have been notified at 14:31:42 > when a granted message is received, but instead it is notified when the > client rpc.lockd polls again 10 seconds from lock inception. I almost > wonder if that ESRCH shouldn't have been the notification for 934 and it > was using the wrong pid. Just looking at the order of the messages, I don't think so. The nlm_* messages appear to be printed at the beginning of the RPC handler. If the lock is being released because the process exited and closed the file descriptor, then by the time the server is notified and the client rpc.lockd gets the response in the server, the process that orignally grabbed the lock is gone. I don't know why rpc.lockd wants to tell the process that it successfully dropped the lock, though ...