From owner-freebsd-hackers Tue Aug 8 22:55:33 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from cs.rpi.edu (mumble.cs.rpi.edu [128.213.8.16]) by hub.freebsd.org (Postfix) with ESMTP id 3A3C337B6B9 for ; Tue, 8 Aug 2000 22:55:29 -0700 (PDT) (envelope-from crossd@prolog.cs.rpi.edu) Received: from prolog.cs.rpi.edu (prolog.cs.rpi.edu [128.213.12.16]) by cs.rpi.edu (8.9.3/8.9.3) with ESMTP id BAA66799; Wed, 9 Aug 2000 01:55:27 -0400 (EDT) Message-Id: <200008090555.BAA66799@cs.rpi.edu> To: freebsd-hackers@freebsd.org Cc: crossd@cs.rpi.edu Subject: NFS/TCP problems. 4.0-RELEASE server, sol 8 client Date: Wed, 09 Aug 2000 01:55:27 -0400 From: "David E. Cross" Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I have recently had the time to start devoting more time to FreeBSD; especially the NFS code. I have stumbled upon a problem that seems to be out of my league. The problem is manifested when NFS/TCP connections just hang. Sometimes for only a few seconds, other times for minutes. Below is a network capture of all traffic between a server and a client (captured from the server): The intermittent UDP traffic is AMD issuing a null NFS request to verify that the server is still alive. Of note is the very long delays. Simply put, the FreeBSD box is not responding, not even an ACK. Also of note, about line 23 there is an ACK to a connection that does not even exist, with no noticeable activity on it for at least 2 seconds; what is is ACK-ing? It appears the Sol client keeps issuing RSTs until the client and server get back in sync WRT TCP sequence numbers, but what is driving them out of sequence, and why is the FreeBSD server not saying anything to the client (there is no firewall on any machine in this configuration.) 22:35:45.342966 10.1.1.1.1020 > 10.1.1.7.2049: S 1733116406:1733116406(0) win 24820 (DF) 22:35:54.723772 10.1.1.1.1020 > 10.1.1.7.2049: R 1733116407:1733116407(0) ack 0 win 24820 (DF) 22:35:54.723870 10.1.1.1.1020 > 10.1.1.7.2049: S 1747908437:1747908437(0) win 24820 (DF) 22:35:58.094003 10.1.1.1.1020 > 10.1.1.7.2049: S 1747908437:1747908437(0) win 24820 (DF) 22:36:02.054379 10.1.1.1.7906 > 10.1.1.7.2049: 40 null (DF) 22:36:02.054668 10.1.1.7.2049 > 10.1.1.1.7906: reply ok 24 null 22:36:04.844523 10.1.1.1.1020 > 10.1.1.7.2049: S 1747908437:1747908437(0) win 24820 (DF) 22:36:18.345779 arp who-has 10.1.1.7 (ff:ff:ff:ff:ff:ff) tell 10.1.1.1 22:36:18.345848 arp reply 10.1.1.7 is-at 0:a0:c9:55:94:18 22:36:18.346070 10.1.1.1.1020 > 10.1.1.7.2049: S 1747908437:1747908437(0) win 24820 (DF) 22:36:32.056867 10.1.1.1.7938 > 10.1.1.7.2049: 40 null (DF) 22:36:32.057258 10.1.1.7.2049 > 10.1.1.1.7938: reply ok 24 null 22:36:45.347892 10.1.1.1.1020 > 10.1.1.7.2049: S 1747908437:1747908437(0) win 24820 (DF) 22:36:54.728709 10.1.1.1.1020 > 10.1.1.7.2049: R 14792031:14792031(0) ack 1 win 24820 (DF) 22:36:54.728810 10.1.1.1.1020 > 10.1.1.7.2049: S 1762707767:1762707767(0) win 24820 (DF) 22:36:58.098954 10.1.1.1.1020 > 10.1.1.7.2049: S 1762707767:1762707767(0) win 24820 (DF) 22:37:02.059319 10.1.1.1.7970 > 10.1.1.7.2049: 40 null (DF) 22:37:02.059632 10.1.1.7.2049 > 10.1.1.1.7970: reply ok 24 null 22:37:04.849475 10.1.1.1.1020 > 10.1.1.7.2049: S 1762707767:1762707767(0) win 24820 (DF) 22:37:18.350703 arp who-has 10.1.1.7 (ff:ff:ff:ff:ff:ff) tell 10.1.1.1 22:37:18.350788 arp reply 10.1.1.7 is-at 0:a0:c9:55:94:18 22:37:18.350972 10.1.1.1.1020 > 10.1.1.7.2049: S 1762707767:1762707767(0) win 24820 (DF) 22:37:25.648257 10.1.1.7.2049 > 10.1.1.1.1022: . ack 73099259 win 33176 22:37:25.648451 10.1.1.1.1022 > 10.1.1.7.2049: R 73099259:73099259(0) win 0 (DF) 22:37:32.061812 10.1.1.1.8002 > 10.1.1.7.2049: 40 null (DF) 22:37:32.062179 10.1.1.7.2049 > 10.1.1.1.8002: reply ok 24 null 22:37:38.483949 arp who-has 10.1.1.254 tell 10.1.1.7 22:37:38.484115 arp reply 10.1.1.254 is-at 0:50:da:23:e7:2 22:37:45.352837 10.1.1.1.1020 > 10.1.1.7.2049: S 1762707767:1762707767(0) win 24820 (DF) 22:37:54.733653 10.1.1.1.1020 > 10.1.1.7.2049: R 29591361:29591361(0) ack 1 win 24820 (DF) 22:37:54.733759 10.1.1.1.1020 > 10.1.1.7.2049: S 1777476913:1777476913(0) win 24820 (DF) 22:37:58.103890 10.1.1.1.1020 > 10.1.1.7.2049: S 1777476913:1777476913(0) win 24820 (DF) 22:38:02.064269 10.1.1.1.8034 > 10.1.1.7.2049: 40 null (DF) 22:38:02.064651 10.1.1.7.2049 > 10.1.1.1.8034: reply ok 24 null 22:38:04.854408 10.1.1.1.1020 > 10.1.1.7.2049: S 1777476913:1777476913(0) win 24820 (DF) 22:38:18.355715 10.1.1.1.1020 > 10.1.1.7.2049: S 1777476913:1777476913(0) win 24820 (DF) 22:38:32.066736 10.1.1.1.8066 > 10.1.1.7.2049: 40 null (DF) 22:38:32.067015 10.1.1.7.2049 > 10.1.1.1.8066: reply ok 24 null 22:38:45.357792 10.1.1.1.1020 > 10.1.1.7.2049: S 1777476913:1777476913(0) win 24820 (DF) 22:38:54.738595 10.1.1.1.1020 > 10.1.1.7.2049: R 44360507:44360507(0) ack 1 win 24820 (DF) 22:38:54.738693 10.1.1.1.1020 > 10.1.1.7.2049: S 1792277515:1792277515(0) win 24820 (DF) 22:38:58.108813 10.1.1.1.1020 > 10.1.1.7.2049: S 1792277515:1792277515(0) win 24820 (DF) 22:39:02.069178 10.1.1.1.8098 > 10.1.1.7.2049: 40 null (DF) 22:39:02.069497 10.1.1.7.2049 > 10.1.1.1.8098: reply ok 24 null 22:39:04.859336 10.1.1.1.1020 > 10.1.1.7.2049: S 1792277515:1792277515(0) win 24820 (DF) 22:39:18.360595 arp who-has 10.1.1.7 (ff:ff:ff:ff:ff:ff) tell 10.1.1.1 22:39:18.360667 arp reply 10.1.1.7 is-at 0:a0:c9:55:94:18 22:39:18.360895 10.1.1.1.1020 > 10.1.1.7.2049: S 1792277515:1792277515(0) win 24820 (DF) 22:39:32.084018 10.1.1.1.8130 > 10.1.1.7.2049: 40 null (DF) 22:39:32.084397 10.1.1.7.2049 > 10.1.1.1.8130: reply ok 24 null 22:39:45.362736 10.1.1.1.1020 > 10.1.1.7.2049: S 1792277515:1792277515(0) win 24820 (DF) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message