From owner-freebsd-current@FreeBSD.ORG Thu Nov 5 16:29:00 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3586106566C for ; Thu, 5 Nov 2009 16:28:59 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A5ACC8FC17 for ; Thu, 5 Nov 2009 16:28:59 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAFuM8kqDaFvH/2dsb2JhbADgPIQ9BIFm X-IronPort-AV: E=Sophos;i="4.44,687,1249272000"; d="scan'208";a="52636101" Received: from danube.cs.uoguelph.ca ([131.104.91.199]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 05 Nov 2009 11:28:58 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id BE6EA1084587 for ; Thu, 5 Nov 2009 11:28:58 -0500 (EST) X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca Received: from danube.cs.uoguelph.ca ([127.0.0.1]) by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q4KSiIlVH7p7 for ; Thu, 5 Nov 2009 11:28:57 -0500 (EST) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id C152E1084525 for ; Thu, 5 Nov 2009 11:28:57 -0500 (EST) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id nA5GaN808229 for ; Thu, 5 Nov 2009 11:36:23 -0500 (EST) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Thu, 5 Nov 2009 11:36:23 -0500 (EST) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-current@freebsd.org In-Reply-To: <4AF0B7DF.9030405@freebsd.org> Message-ID: References: <4AF0B7DF.9030405@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2009 16:29:00 -0000 Rick Macklem wrote: > I can now reproduce what I think others were seeing as slow reconnects > when using NFSv3 over TCP against a server that disconnects inactive > TCP connections. I have had some luck figuring out what is going on > and can reproduce it fairly easily, but I really need help from someone > who understands the FreeBSD TCP stack. > Ok, I haven't made much progress on this, but here's what little I currently know about it. The problem occurs after a server has dropped an inactive TCP connection for an NFS over TCP mount (in my case a Solaris10 server). When the client does a new connection it, for some reason, sends a RST at almost exactly the same time as the first RPC request on the new TCP connection, causing the server to shut it down. Ok, things I now know don't affect this are: - doing the soshutdown(), soclose() on the old connection. I commented them out and it still happened. - Avoiding the sobind() on the new connection, done before the soconnect(). - Using a non-reserved port#. (The above tests shot down pretty well all the "theories" I could come up with.) The only thing I've found that avoids the problem: - putting a 2sec delay right before the soconnect() call. (A 1sec delay made it hard to reproduce and I've never reproduced it yet with a 2sec delay.) Not much of a fix, though. Now, here's where someone may be able to help? Grep'ng around, I found 4 places where the TCP stack called ip_output() (one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and tcp_syncache.c) and I put a printf like this just before them: if (flags & TH_RST) printf("sent a reset\n"); (The exact format varies for each, because of where the TCP header flags are and have different printf messages.) Now, the weird part is, that when the extraneous RST is sent to the server, I don't get any printf. (I do get a few of these, but at other times for what appear to be legitimate RSTs.) I can't see anywhere else that the TCP stack would send an RST and, so, I'm stuck w.r.t. figuring out what is sending them? Anyone know of another place the TCP stack would make the send happen? (Or is it queued earlier when I see the printf message, and then the send is "triggered" inside the ip layer when the first data is sent on the new connection?) rick, who is getting sick of looking at this:-)