Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 May 2003 14:51:17 -0400 (EDT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        "Andrew P. Lentvorski, Jr." <bsder@allcaps.org>
Cc:        current@FreeBSD.org
Subject:   Re: rpc.lockd spinning; much breakage
Message-ID:  <Pine.NEB.3.96L.1030513135121.72145Q-100000@fledge.watson.org>
In-Reply-To: <Pine.NEB.3.96L.1030513133547.72145O-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 13 May 2003, Robert Watson wrote:

> So the client isn't retrying, or mapping errors right after this patch,
> but the failure modes are more consistent and I seem not to be getting
> any interminable hangs anymore on the client. 

I should clarify this statement: I no longer get the odd hangs when it
comes to client and server interactions when contending a lock established
on the server and now tested by the client.  I still bump into the "client
isn't woken up in a timely manner after a lock is released by the same or
another client".  Here's the demonstration case with a bit more detail
from what I presented earlier.  The server runs on host cboss, the client
runs twice on host crash1 on different pty's.  In this scenario, each
client attempts to grab an exclusive lock, potentially blocking, and then
sleep for 10 seconds (this is with one of the earlier posted patches):

crash1:/tmp> ./locktest nocreate openlock block noflock test 10
933  open(test, 32, 0666)               Tue May 13 14:31:31 2003
933  open() returns                     Tue May 13 14:31:31 2003
933  sleep(10)                          Tue May 13 14:31:31 2003
933  sleep() returns                    Tue May 13 14:31:41 2003

crash1:/tmp> ./locktest nocreate openlock block noflock test 0
934  open(test, 32, 0666)               Tue May 13 14:31:33 2003
934  open() returns                     Tue May 13 14:31:53 2003

rpc.lockd results on crash1:

May 13 14:31:31 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 14:31:33 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 14:31:42 crash1 rpc.lockd: nlm_granted_msg from 192.168.50.1
May 13 14:31:42 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 14:31:42 crash1 rpc.lockd: process 933: No such process
May 13 14:31:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1

In this example, pid 934 requests the lock on the object at 14:31:33 --
pid 933 released that lock at 14:31:41, but the pid 934 isn't notified
until 14:31:53.  It looks like it should have been notified at 14:31:42
when a granted message is received, but instead it is notified when the
client rpc.lockd polls again 10 seconds from lock inception.  I almost
wonder if that ESRCH shouldn't have been the notification for 934 and it
was using the wrong pid. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030513135121.72145Q-100000>