From owner-svn-src-head@FreeBSD.ORG Thu Jan 26 13:33:28 2012 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81768106564A; Thu, 26 Jan 2012 13:33:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 15E908FC1B; Thu, 26 Jan 2012 13:33:27 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0QDXN5R022009 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 27 Jan 2012 00:33:26 +1100 Date: Fri, 27 Jan 2012 00:33:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <919199278.155166.1327535363109.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20120127000316.U1055@besplex.bde.org> References: <919199278.155166.1327535363109.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, Rick Macklem , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans Subject: Re: svn commit: r230516 - in head/sys: fs/nfsclient nfsclient X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2012 13:33:28 -0000 On Wed, 25 Jan 2012, Rick Macklem wrote: > Bruce Evans wrote: >> On Tue, 24 Jan 2012, Rick Macklem wrote: >> >>> Bruce Evans wrote: >>>> On Wed, 25 Jan 2012, Rick Macklem wrote: >>>> >>>>> Log: >>>>> If a mount -u is done to either NFS client that switches it >>>>> from TCP to UDP and the rsize/wsize/readdirsize is greater >>>>> than NFS_MAXDGRAMDATA, it is possible for a thread doing an >>>>> I/O RPC to get stuck repeatedly doing retries. This happens >>>>> ... >> >>>> Could it wait for the old i/o to complete (and not start any new >>>> i/o?). This is little different from having to wait when changing >>>> from rw to ro. The latter is not easy, and at least the old nfs >>>> client seems to not even dream of it. ffs has always called a >>>> ... >> >>> As you said above "not easy ... uses complicated suspension of i/o". >>> I have not tried to code this, but I think it would be non-trivial. >>> The code would need to block new I/O before RPCs are issued and wait >>> for all in-progress I/Os to complete. At this time, the kernel RPC >>> handles the in-progress RPCs and NFS doesn't "know" what is >>> outstanding. Of course, code could be added to keep track of >>> in-progress >>> I/O RPCs, but that would have to be written, as well. >> >> Hmm, this means that even when the i/o sizes are small, the mode >> switch >> from tcp to udp may be unsafe since there may still be i/o's with >> higher >> sizes outstanding. So to switch from tcp to udp, the user should first >> reduce the sizes, when wait a while before switching to udp. And what >> happens with retries after changing sizes up or down? Does it retry >> with the old sizes? >> >> Bruce > Good point. I think (assuming a TCP mount with large rsize): > # mount -u -o rsize=16384 /mnt > # mount -u -o udp /mnt > - could still result in a wedged thread trying to do a read that > is too large for UDP. > > I'll revert r230516, since it doesn't really fix the problem, it just > reduced its lieklyhood. That seems a regression. > I'll ask on freebsd-fs@ if anyone finds switching from TCP->UDP via a > "mount -u" is useful to them. If no one thinks it's necessary, the patch > could just disallow the switch, no matter what the old rsize/wsize/readdirsize > is. I use it a lot for performance testing. Of course it is unnecessary, since a least for performance testing it is possible to do a full unmount and re-mount, but mount -u is more convenient. > Otherwise, the fix is somewhat involved and difficult for a scenario > like this, where the NFS server is network partitioned or crashed: > - sysadmin notices NFS mount is "hung" and does > # mount -u -o udp /path > to try and fix it, but it doesn't help > - sysadmin tries "umount -f /path" to get rid of the "hung" mount. Now I wonder what makes a full unmount (without without -f) and re-mount work. > If "mount -u -o udp /path" is waiting for I/O ops to complete, > (which is what the somewhat involved patch would need to do) the > "umount -f /path" will get stuck waiting for the "mount -u" > which will be waiting for I/O RPCs to complete. This could I often misremember -f for umount is meaning don't wait. It actually means to forcibly close files before proceeding. > be partially fixed by making sure that the "mount -u -o udp /path" is > interruptible (via C), but I still don't like the idea that > "umount -f /path" won't work if "mount -u -o udp /path" is sitting in > the kernel waiting for RPCs to complete, which would need to be done > to make a TCP->UDP switch work. Doesn't umount -f have to wait for i/o anyway? When it closes files, it must wait for all in-progress i/o for the files, and for all new i/o's that result from closing. Bruce