From owner-freebsd-hackers Thu Dec 19 15:34:27 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id PAA20214 for hackers-outgoing; Thu, 19 Dec 1996 15:34:27 -0800 (PST) Received: from eldorado.net-tel.co.uk ([193.122.171.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id PAA20205 for ; Thu, 19 Dec 1996 15:34:22 -0800 (PST) From: Andrew.Gordon@net-tel.co.uk Received: (from root@localhost) by eldorado.net-tel.co.uk (8.6.12/8.6.10) id XAA17679; Thu, 19 Dec 1996 23:28:07 GMT Received: from "/PRMD=NET-TEL/ADMD=GOLD 400/C=GB/" by net-tel.co.uk (Route400-RFCGate); Thu, 19 Dec 96 23:25:44 +0000 X400-Received: by mta "eldorado" in "/PRMD=net-tel/ADMD=gold 400/C=gb/"; Relayed; Thu, 19 Dec 96 23:25:44 +0000 X400-Received: by mta "net-tel cambridge" in "/PRMD=net-tel/ADMD=gold 400/C=gb/"; Relayed; Thu, 19 Dec 96 23:25:42 +0000 X400-Received: by "/PRMD=NET-TEL/ADMD=Gold 400/C=GB/"; Relayed; Thu, 19 Dec 96 23:25:41 +0000 X400-MTS-Identifier: ["/PRMD=NET-TEL/ADMD=Gold 400/C=GB/";hst:21271-961219232541-0D12] X400-Content-Type: P2-1984 (2) X400-Originator: Andrew.Gordon@net-tel.co.uk Original-Encoded-Information-Types: IA5-Text X400-Recipients: non-disclosure:; Date: Thu, 19 Dec 96 23:25:41 +0000 X400-Content-Identifier: Re(2): rpc.lockd Message-Id: <"67ad-961219232530-7084*/G=Andrew/S=Gordon/O=NET-TEL Computer Systems Ltd/PRMD=NET-TEL/ADMD=Gold 400/C=GB/"@MHS> To: list:; Cc: "Ron G. Minnich" , terry@lambert.org, freebsd-hackers@freebsd.org In-Reply-To: <199612192153.OAA12245@phaeton.artisoft.com> Subject: Re(2): rpc.lockd in nfs in freebsd vs. sun nfs locking Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > As late as Solaris 2.4, we were still seeing an occasional 'lock storm', > > where RPC lock traffic would eat the wire on one particular error > > condition that would occur when a client rebooted. This was elicited by > > sendmail locking in /var/spool/mail. We just had a lockup the other day > on > > a 2.5 machine, we're not sure why but the process that was hung was ... > > sendmail. The guy who rebooted the machine didn't get me a core dump > > though. > > Certainly, when the rpc.statd notes the server death and the client > comes back up, all clients will relock everything they had open > (that's why NFS locking is stateful). > > When a client dies, the client doesn't have lock state and therefore > can not reestablish locks, so that can't be the cause of your > "lock storm" problems. However, when the client comes up the client rpc.statd is supposed to notify the server rpc.statd which in turn notifies the server rpc.lockd so that it can release any locks previously held by the server. Of course, this shouldn't cause any further traffic unless the server thought it had some locks on the client too (ie. both machines were exporting FSs to each other). Can you get a trace of the traffic on the wire when this happens? I can't see anything in the protocol that could lead to a lock storm, but it would be useful to understand it (if nothing else, to avoid duplicating Solaris bugs in the FreeBSD implementation!).