From owner-freebsd-hackers  Thu Dec 19 15:34:27 1996
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id PAA20214
          for hackers-outgoing; Thu, 19 Dec 1996 15:34:27 -0800 (PST)
Received: from eldorado.net-tel.co.uk ([193.122.171.253])
          by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id PAA20205
          for <freebsd-hackers@freebsd.org>; Thu, 19 Dec 1996 15:34:22 -0800 (PST)
From: Andrew.Gordon@net-tel.co.uk
Received: (from root@localhost) by eldorado.net-tel.co.uk (8.6.12/8.6.10) id XAA17679; Thu, 19 Dec 1996 23:28:07 GMT
Received: from "/PRMD=NET-TEL/ADMD=GOLD 400/C=GB/" by net-tel.co.uk
        (Route400-RFCGate); Thu, 19 Dec 96 23:25:44 +0000
X400-Received: by mta "eldorado" in "/PRMD=net-tel/ADMD=gold 400/C=gb/"; 
        Relayed; Thu, 19 Dec 96 23:25:44 +0000
X400-Received: by mta "net-tel cambridge" in "/PRMD=net-tel/ADMD=gold 400/C=gb/";
         Relayed; Thu, 19 Dec 96 23:25:42 +0000
X400-Received: by "/PRMD=NET-TEL/ADMD=Gold 400/C=GB/"; Relayed; 
        Thu, 19 Dec 96 23:25:41 +0000
X400-MTS-Identifier: 
        ["/PRMD=NET-TEL/ADMD=Gold 400/C=GB/";hst:21271-961219232541-0D12]
X400-Content-Type: P2-1984 (2)
X400-Originator: Andrew.Gordon@net-tel.co.uk
Original-Encoded-Information-Types: IA5-Text
X400-Recipients: non-disclosure:;
Date: Thu, 19 Dec 96 23:25:41 +0000
X400-Content-Identifier: Re(2): rpc.lockd
Message-Id: <"67ad-961219232530-7084*/G=Andrew/S=Gordon/O=NET-TEL Computer
 Systems Ltd/PRMD=NET-TEL/ADMD=Gold 400/C=GB/"@MHS>
To: list:;
Cc: "Ron G. Minnich" <rminnich@Sarnoff.COM>, terry@lambert.org,
        freebsd-hackers@freebsd.org
In-Reply-To: <199612192153.OAA12245@phaeton.artisoft.com>
Subject: Re(2): rpc.lockd in nfs in freebsd vs. sun nfs locking
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> > As late as Solaris 2.4, we were still seeing an occasional 'lock storm',
> > where RPC lock traffic would eat the wire on one particular error
> > condition that would occur when a client rebooted. This was elicited by
> > sendmail locking in /var/spool/mail. We just had a lockup the other day
> on
> > a 2.5 machine, we're not sure why but the process that was hung was ... 
> > sendmail. The guy who rebooted the machine didn't get me a core dump
> > though. 
> 
> Certainly, when the rpc.statd notes the server death and the client
> comes back up, all clients will relock everything they had open
> (that's why NFS locking is stateful).
> 
> When a client dies, the client doesn't have lock state and therefore
> can not reestablish locks, so that can't be the cause of your
> "lock storm" problems.

However, when the client comes up the client rpc.statd is supposed
to notify the server rpc.statd which in turn notifies the server
rpc.lockd so that it can release any locks previously held by the
server.  Of course, this shouldn't cause any further traffic unless
the server thought it had some locks on the client too (ie. both
machines were exporting FSs to each other).

Can you get a trace of the traffic on the wire when this happens?
I can't see anything in the protocol that could lead to a lock storm,
but it would be useful to understand it (if nothing else, to avoid
duplicating Solaris bugs in the FreeBSD implementation!).