From owner-freebsd-stable@FreeBSD.ORG  Tue Mar 21 22:56:25 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: stable@freebsd.org
Delivered-To: freebsd-stable@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A04BB16A400;
	Tue, 21 Mar 2006 22:56:25 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9C84C43D5E;
	Tue, 21 Mar 2006 22:56:24 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.13.4/8.13.4) with ESMTP id k2LMuHIe006843;
	Tue, 21 Mar 2006 14:56:17 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.13.4/8.13.4/Submit) id k2LMuHT0006842;
	Tue, 21 Mar 2006 14:56:17 -0800 (PST)
Date: Tue, 21 Mar 2006 14:56:17 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200603212256.k2LMuHT0006842@apollo.backplane.com>
To: Mikhail Teterin <mi+mx@aldan.algebra.com>
References: <200603211607.30372.mi+mx@aldan.algebra.com>
	<200603211747.36251.mi+mx@aldan.algebra.com>
Cc: alc@freebsd.org, stable@freebsd.org
Subject: Re: more weird bugs with mmap-ing via NFS
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Mar 2006 22:56:25 -0000


:When the client is in this state it remains quite usable except for the 
:following:
:
:	1) Trying to start `systat 1 -vm' stalls ALL access to local disks,
:	   apparently -- no new programs can start, and the running ones
:	   can not access any data either; attempts to Ctrl-C the starting
:	   systat succeed only after several minutes.
:
:	2) The writing process is stuck unkillable in the following state:
:
:		CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME
:		27  -4  0 1351368 137764 nfs    DL    p4    1:05,52
:
:	   Sending it any signal has no effect. (Large sizes are explained
:	   by it mmap-ing its large input and output.)
:
:	3) Forceful umount of the share, that the program is writing to,
:	   paralyzes the system for several minutes -- unlike in 1), not
:	   even the mouse is moving. It would seem, the process is dumping
:	   core, but it is not -- when the system unfreezes, the only
:	   message from the kernel is:
:
:		vm_fault: pager read error, pid XXXX (mzip)
:	  
:Again, this is on 6.1/i386 from today, which we are about to release into the 
:cruel world.
:
:Yours,
:
:	-mi

    There are a number of problems using a block size of 65536.  First of
    all, I think you can only safely do it if you use a TCP mount, also
    assuming the TCP buffer size is appropriately large to hold an entire
    packet.  For UDP mounts, 65536 is too large (the UDP data length can
    only be 65536 bytes.  For that matter, the *IP* packet itself can 
    not exceed 65535 bytes.  So 65536 will not work with a UDP mount.

    The second problem is related to the network driver.  The packet MTU
    is 1500, which means, typically, a limit of around 1460-1480 payload
    bytes per packet.  A UDP large UDP packet that is, say, 48KB, will be
    broken down into over 33 IP packet fragments.  The network stack could
    very well drop some of these packet fragments making delivery of the 
    overall UDP packet unreliable.

    The NFS protocol itself does allow read and write packets to be
    truncated providing that the read or write operation is either bounded
    by the file EOF or (for a read) the remaining data is all zero's.  
    Typically the all-zero's case is only optimized by the NFS server when
    the underlying filesystem block itself is unallocated (i.e. a 'hole'
    in the file).  In all other cases the full NFS block size is passed
    between client and server.

    I would stick to an NFS block size of 8K or 16K.  Frankly, there is
    no real reason to use a larger block size.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>