From owner-freebsd-stable@FreeBSD.ORG  Wed Oct 28 20:31:25 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B21B51065670;
	Wed, 28 Oct 2009 20:31:25 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 390798FC1A;
	Wed, 28 Oct 2009 20:31:24 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApoEAKdG6EqDaFvG/2dsb2JhbADbE4Q/BIFh
X-IronPort-AV: E=Sophos;i="4.44,641,1249272000"; d="scan'208";a="51586191"
Received: from amazon.cs.uoguelph.ca ([131.104.91.198])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 28 Oct 2009 16:31:24 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 287472100BD;
	Wed, 28 Oct 2009 16:31:24 -0400 (EDT)
X-Virus-Scanned: amavisd-new at amazon.cs.uoguelph.ca
Received: from amazon.cs.uoguelph.ca ([127.0.0.1])
	by localhost (amazon.cs.uoguelph.ca [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id 0+LaL0L1LktN; Wed, 28 Oct 2009 16:31:20 -0400 (EDT)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 5029F210192;
	Wed, 28 Oct 2009 16:31:20 -0400 (EDT)
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n9SKcRo21396; Wed, 28 Oct 2009 16:38:28 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Wed, 28 Oct 2009 16:38:27 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: Olaf Seibert <O.Seibert@cs.ru.nl>
In-Reply-To: <20091027164159.GU841@twoquid.cs.ru.nl>
Message-ID: <Pine.GSO.4.63.0910281624440.18390@muncher.cs.uoguelph.ca>
References: <20091027164159.GU841@twoquid.cs.ru.nl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: dfr@freebsd.org, freebsd-current@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: 8.0-RC1 NFS client timeout issue
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Oct 2009 20:31:25 -0000


First off, I know that cross posting is evil, but I wanted to try
and make sure developers saw it.

On Tue, 27 Oct 2009, Olaf Seibert wrote:

> I see an annoying behaviour with NFS over TCP. It happens both with nfs
> and newnfs. This is with FreeBSD/amd64 8.0-RC1 as client. The server is
> some Linux or perhaps Solaris, I'm not entirely sure.
>
> After trying to find something in packet traces, I think I have found
> something.
>
> The scenario seems to be as follows. Sorry for the width of the lines.
>
>
> No.     Time        Source                Destination           Protocol Info
>   2296 2992.216855 xxx.xxx.31.43         xxx.xxx.16.142        NFS      V3 LOOKUP Call (Reply In 2297), DH:0x3819da36/w
>   2297 2992.217107 xxx.xxx.16.142        xxx.xxx.31.43         NFS      V3 LOOKUP Reply (Call In 2296) Error:NFS3ERR_NOENT
>   2298 2992.217141 xxx.xxx.31.43         xxx.xxx.16.142        NFS      V3 LOOKUP Call (Reply In 2299), DH:0x170cb16a/bin
>   2299 2992.217334 xxx.xxx.16.142        xxx.xxx.31.43         NFS      V3 LOOKUP Reply (Call In 2298), FH:0x61b8eb12
>   2300 2992.217361 xxx.xxx.31.43         xxx.xxx.16.142        NFS      V3 ACCESS Call (Reply In 2301), FH:0x61b8eb12
>   2301 2992.217582 xxx.xxx.16.142        xxx.xxx.31.43         NFS      V3 ACCESS Reply (Call In 2300)
>   2302 2992.217605 xxx.xxx.31.43         xxx.xxx.16.142        NFS      V3 LOOKUP Call (Reply In 2303), DH:0x61b8eb12/w
>   2303 2992.217860 xxx.xxx.16.142        xxx.xxx.31.43         NFS      V3 LOOKUP Reply (Call In 2302) Error:NFS3ERR_NOENT
>   2304 2992.318770 xxx.xxx.31.43         xxx.xxx.16.142        TCP      934 > nfs [ACK] Seq=238293 Ack=230289 Win=8192 Len=0 TSV=86492342 TSER=12393434
>   2306 3011.537520 xxx.xxx.16.142        xxx.xxx.31.43         NFS      V3 GETATTR Reply (Call In 2305)  Directory mode:2755 uid:4100 gid:4100
>   2307 3011.637744 xxx.xxx.31.43         xxx.xxx.16.142        TCP      934 > nfs [ACK] Seq=238429 Ack=230405 Win=8192 Len=0 TSV=86511662 TSER=12395366
>   2308 3371.534980 xxx.xxx.16.142        xxx.xxx.31.43         TCP      nfs > 934 [FIN, ACK] Seq=230405 Ack=238429 Win=49232 Len=0 TSV=12431366 TSER=86511662
>
> 	The server decides, for whatever reason, to terminate the
> 	connection and sends a FIN.
>
>   2309 3371.535018 xxx.xxx.31.43         xxx.xxx.16.142        TCP      934 > nfs [ACK] Seq=238429 Ack=230406 Win=8192 Len=0 TSV=86871578 TSER=12431366
>
> 	Client acknowledges this,
>
>   2310 3375.379693 xxx.xxx.31.43         xxx.xxx.16.142        NFS      V3 ACCESS Call, FH:0x008002a2
>
> 	but tries to sneak in another call anyway.	[A]
>
Probably not the best behaviour, but I think it is technically allowed by 
TCP. (My TCP is very rusty, but I think the socket should be in
TCPS_CLOSE_WAIT at this point and the BSD code will have called
socantrcvmore(), but not socantsndmore().)

>   2311 3375.474788 xxx.xxx.16.142        xxx.xxx.31.43         TCP      nfs > 934 [ACK] Seq=230406 Ack=238569 Win=49232 Len=0 TSV=12431760 TSER=86875423
>
> 	Server ACKs but doesn't send anything else... [B]
>
> 	Time passes...
>
This is where it seems interesting. It looks to me like the socket upcall
for receiving the FIN would have happened before this point, setting the
ct_error.re_status to RPC_CANTRECV, but the code in clnt_vc_call() doesn't
check for this. (It does check for it happening during and after the
sosend(), but not before it, from what I can see.)

>
> [B] would be a bug of the server in my opinion. If it ACKs a call, it
> should send a reply. And if it can't, it shouldn't.
>
I'll leave this one for the TCP wizzards. I'm not sure what the
correct behaviour is when data is received on a connection. (I think
it is waiting for a FIN from the client side at this point.)

If you could try the following patch and see if it helps, that would be
appreciated, rick
ps: I'll try to reproduce the situation here, but I'm not sure if I can.
--- rpc/clnt_vc.c.sav	2009-10-28 15:44:20.000000000 -0400
+++ rpc/clnt_vc.c	2009-10-28 15:49:57.000000000 -0400
@@ -413,6 +413,19 @@

  	cr->cr_xid = xid;
  	mtx_lock(&ct->ct_lock);
+	/*
+	 * Check to see if the other end has already started to close down
+	 * the connection. If it happens after this point, it will be
+	 * detected below, when cr->cr_error is checked.
+	 */
+	if (ct->ct_error.re_status == RPC_CANTRECV) {
+		if (errp != &ct->ct_error) {
+			errp->re_errno = ct->ct_error.re_errno;
+			errp->re_status = RPC_CANTRECV;
+		}
+		stat = RPC_CANTRECV;
+		goto out;
+	}
  	TAILQ_INSERT_TAIL(&ct->ct_pending, cr, cr_link);
  	mtx_unlock(&ct->ct_lock);