From owner-svn-src-head@FreeBSD.ORG  Thu Jan 26 13:33:28 2012
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 81768106564A;
	Thu, 26 Jan 2012 13:33:28 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 15E908FC1B;
	Thu, 26 Jan 2012 13:33:27 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q0QDXN5R022009
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 27 Jan 2012 00:33:26 +1100
Date: Fri, 27 Jan 2012 00:33:23 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <919199278.155166.1327535363109.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20120127000316.U1055@besplex.bde.org>
References: <919199278.155166.1327535363109.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, Rick Macklem <rmacklem@FreeBSD.org>,
	svn-src-all@FreeBSD.org, src-committers@FreeBSD.org,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: svn commit: r230516 - in head/sys: fs/nfsclient nfsclient
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
	<svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jan 2012 13:33:28 -0000

On Wed, 25 Jan 2012, Rick Macklem wrote:

> Bruce Evans wrote:
>> On Tue, 24 Jan 2012, Rick Macklem wrote:
>>
>>> Bruce Evans wrote:
>>>> On Wed, 25 Jan 2012, Rick Macklem wrote:
>>>>
>>>>> Log:
>>>>>  If a mount -u is done to either NFS client that switches it
>>>>>  from TCP to UDP and the rsize/wsize/readdirsize is greater
>>>>>  than NFS_MAXDGRAMDATA, it is possible for a thread doing an
>>>>>  I/O RPC to get stuck repeatedly doing retries. This happens
>>>>>  ...
>>
>>>> Could it wait for the old i/o to complete (and not start any new
>>>> i/o?). This is little different from having to wait when changing
>>>> from rw to ro. The latter is not easy, and at least the old nfs
>>>> client seems to not even dream of it. ffs has always called a
>>>> ...
>>
>>> As you said above "not easy ... uses complicated suspension of i/o".
>>> I have not tried to code this, but I think it would be non-trivial.
>>> The code would need to block new I/O before RPCs are issued and wait
>>> for all in-progress I/Os to complete. At this time, the kernel RPC
>>> handles the in-progress RPCs and NFS doesn't "know" what is
>>> outstanding. Of course, code could be added to keep track of
>>> in-progress
>>> I/O RPCs, but that would have to be written, as well.
>>
>> Hmm, this means that even when the i/o sizes are small, the mode
>> switch
>> from tcp to udp may be unsafe since there may still be i/o's with
>> higher
>> sizes outstanding. So to switch from tcp to udp, the user should first
>> reduce the sizes, when wait a while before switching to udp. And what
>> happens with retries after changing sizes up or down? Does it retry
>> with the old sizes?
>>
>> Bruce
> Good point. I think (assuming a TCP mount with large rsize):
> # mount -u -o rsize=16384 /mnt
> # mount -u -o udp /mnt
> - could still result in a wedged thread trying to do a read that
>  is too large for UDP.
>
> I'll revert r230516, since it doesn't really fix the problem, it just
> reduced its lieklyhood.

That seems a regression.

> I'll ask on freebsd-fs@ if anyone finds switching from TCP->UDP via a
> "mount -u" is useful to them. If no one thinks it's necessary, the patch
> could just disallow the switch, no matter what the old rsize/wsize/readdirsize
> is.

I use it a lot for performance testing.  Of course it is unnecessary,
since a least for performance testing it is possible to do a full
unmount and re-mount, but mount -u is more convenient.

> Otherwise, the fix is somewhat involved and difficult for a scenario
> like this, where the NFS server is network partitioned or crashed:
> - sysadmin notices NFS mount is "hung" and does
>  # mount -u -o udp /path
>  to try and fix it, but it doesn't help
> - sysadmin tries "umount -f /path" to get rid of the "hung" mount.

Now I wonder what makes a full unmount (without without -f) and re-mount work.

> If "mount -u -o udp /path" is waiting for I/O ops to complete,
> (which is what the somewhat involved patch would need to do) the
> "umount -f /path" will get stuck waiting for the "mount -u"
> which will be waiting for I/O RPCs to complete. This could

I often misremember -f for umount is meaning don't wait.  It actually
means to forcibly close files before proceeding.

> be partially fixed by making sure that the "mount -u -o udp /path" is
> interruptible (via <ctrl>C), but I still don't like the idea that
> "umount -f /path" won't work if "mount -u -o udp /path" is sitting in
> the kernel waiting for RPCs to complete, which would need to be done
> to make a TCP->UDP switch work.

Doesn't umount -f have to wait for i/o anyway?  When it closes files,
it must wait for all in-progress i/o for the files, and for all new
i/o's that result from closing.

Bruce