From owner-freebsd-stable@FreeBSD.ORG Tue Oct 27 16:52:26 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1636106566B for ; Tue, 27 Oct 2009 16:52:26 +0000 (UTC) (envelope-from O.Seibert@cs.ru.nl) Received: from rustug.science.ru.nl (rustug.science.ru.nl [131.174.16.158]) by mx1.freebsd.org (Postfix) with ESMTP id 6B0268FC20 for ; Tue, 27 Oct 2009 16:52:26 +0000 (UTC) Received: from kookpunt.science.ru.nl (kookpunt.science.ru.nl [131.174.30.61]) by rustug.science.ru.nl (8.13.7/5.30) with ESMTP id n9RGg2YV010150 for ; Tue, 27 Oct 2009 17:42:03 +0100 (MET) Received: from twoquid.cs.ru.nl (twoquid.cs.ru.nl [131.174.142.38]) by kookpunt.science.ru.nl (8.13.7/5.30) with ESMTP id n9RGfxxP012768; Tue, 27 Oct 2009 17:41:59 +0100 (MET) Received: by twoquid.cs.ru.nl (Postfix, from userid 4100) id 9AD092E067; Tue, 27 Oct 2009 17:41:59 +0100 (CET) Date: Tue, 27 Oct 2009 17:41:59 +0100 From: Olaf Seibert To: freebsd-stable@freebsd.org Message-ID: <20091027164159.GU841@twoquid.cs.ru.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.19 (2009-01-05) X-Spam-Score: -1.799 () ALL_TRUSTED,BAYES_50 X-Scanned-By: MIMEDefang 2.63 on 131.174.30.61 Cc: Olaf Seibert Subject: 8.0-RC1 NFS client timeout issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Oct 2009 16:52:26 -0000 I see an annoying behaviour with NFS over TCP. It happens both with nfs and newnfs. This is with FreeBSD/amd64 8.0-RC1 as client. The server is some Linux or perhaps Solaris, I'm not entirely sure. After trying to find something in packet traces, I think I have found something. The scenario seems to be as follows. Sorry for the width of the lines. No. Time Source Destination Protocol Info 2296 2992.216855 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 LOOKUP Call (Reply In 2297), DH:0x3819da36/w 2297 2992.217107 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 LOOKUP Reply (Call In 2296) Error:NFS3ERR_NOENT 2298 2992.217141 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 LOOKUP Call (Reply In 2299), DH:0x170cb16a/bin 2299 2992.217334 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 LOOKUP Reply (Call In 2298), FH:0x61b8eb12 2300 2992.217361 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 ACCESS Call (Reply In 2301), FH:0x61b8eb12 2301 2992.217582 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 ACCESS Reply (Call In 2300) 2302 2992.217605 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 LOOKUP Call (Reply In 2303), DH:0x61b8eb12/w 2303 2992.217860 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 LOOKUP Reply (Call In 2302) Error:NFS3ERR_NOENT 2304 2992.318770 xxx.xxx.31.43 xxx.xxx.16.142 TCP 934 > nfs [ACK] Seq=238293 Ack=230289 Win=8192 Len=0 TSV=86492342 TSER=12393434 2306 3011.537520 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 GETATTR Reply (Call In 2305) Directory mode:2755 uid:4100 gid:4100 2307 3011.637744 xxx.xxx.31.43 xxx.xxx.16.142 TCP 934 > nfs [ACK] Seq=238429 Ack=230405 Win=8192 Len=0 TSV=86511662 TSER=12395366 2308 3371.534980 xxx.xxx.16.142 xxx.xxx.31.43 TCP nfs > 934 [FIN, ACK] Seq=230405 Ack=238429 Win=49232 Len=0 TSV=12431366 TSER=86511662 The server decides, for whatever reason, to terminate the connection and sends a FIN. 2309 3371.535018 xxx.xxx.31.43 xxx.xxx.16.142 TCP 934 > nfs [ACK] Seq=238429 Ack=230406 Win=8192 Len=0 TSV=86871578 TSER=12431366 Client acknowledges this, 2310 3375.379693 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 ACCESS Call, FH:0x008002a2 but tries to sneak in another call anyway. [A] 2311 3375.474788 xxx.xxx.16.142 xxx.xxx.31.43 TCP nfs > 934 [ACK] Seq=230406 Ack=238569 Win=49232 Len=0 TSV=12431760 TSER=86875423 Server ACKs but doesn't send anything else... [B] Time passes... 2312 3675.366081 xxx.xxx.31.43 xxx.xxx.16.142 TCP 934 > nfs [FIN, ACK] Seq=238569 Ack=230406 Win=8192 Len=0 TSV=87175425 TSER=12431760 Client finally decides after 300 secs to close the connection too 2313 3675.366149 xxx.xxx.31.43 xxx.xxx.16.142 TCP 904 > nfs [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=5 TSV=87175425 TSER=0 and to re-open a new one. 2314 3675.366318 xxx.xxx.16.142 xxx.xxx.31.43 TCP nfs > 934 [ACK] Seq=230406 Ack=238570 Win=49232 Len=0 TSV=12461749 TSER=87175425 2315 3675.366446 xxx.xxx.16.142 xxx.xxx.31.43 TCP nfs > 904 [SYN, ACK] Seq=0 Ack=1 Win=49232 Len=0 TSV=12461749 TSER=87175425 MSS=1460 WS=0 2316 3675.366483 xxx.xxx.31.43 xxx.xxx.16.142 TCP 904 > nfs [ACK] Seq=1 Ack=1 Win=66592 Len=0 TSV=87175425 TSER=12461749 2317 3675.366506 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 ACCESS Call (Reply In 2319), FH:0x008002a2 2318 3675.366660 xxx.xxx.16.142 xxx.xxx.31.43 TCP nfs > 904 [ACK] Seq=1 Ack=141 Win=49092 Len=0 TSV=12461749 TSER=87175425 2319 3675.367356 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 ACCESS Reply (Call In 2317) 2320 3675.367425 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 GETATTR Call (Reply In 2322), FH:0x170cb16a 2321 3675.367644 xxx.xxx.16.142 xxx.xxx.31.43 TCP nfs > 904 [ACK] Seq=125 Ack=277 Win=49232 Len=0 TSV=12461749 TSER=87175426 2322 3675.367730 xxx.xxx.16.142 xxx.xxx.31.43 NFS V3 GETATTR Reply (Call In 2320) Directory mode:2755 uid:4100 gid:4100 2323 3675.367759 xxx.xxx.31.43 xxx.xxx.16.142 NFS V3 ACCESS Call (Reply In 2325), FH:0x170cb16a Point [A] seems somwehat worrisome to me: Though technically the connection is closed in one direction only, the intention of the server seems clear, and it would be better to be careful and make a new connection right away. [B] would be a bug of the server in my opinion. If it ACKs a call, it should send a reply. And if it can't, it shouldn't. Please Cc me on replies, I am not subscribed to this list. -Olaf. --