From owner-freebsd-stable  Fri Jan 18 22: 0:54 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 0B9CF37B404
	for <freebsd-stable@FreeBSD.ORG>; Fri, 18 Jan 2002 22:00:46 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.11.6/8.9.1) id g0J60em46015;
	Fri, 18 Jan 2002 22:00:40 -0800 (PST)
	(envelope-from dillon)
Date: Fri, 18 Jan 2002 22:00:40 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200201190600.g0J60em46015@apollo.backplane.com>
To: Steve Shorter <steve@nomad.tor.lets.net>
Cc: freebsd-stable@FreeBSD.ORG
Subject: Re: "server not responding" / "is alive again" NFS tunables
References:  <20020116101212.A610@nomad.lets.net>
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

    It's probably the NFS retry code being a little too finicky again.
    It should be fairly harmless, but you can try mounting the
    clients with the 'dumbtimer' mount option to get rid of the 
    dynamic retransmit estimator.

    If you are interested in helping me fix the dynamic retransmit
    estimator, send me an email sometime after we release 4.5
    and I'll give you some simple patch sets to try.  (We don't
    have time to fix it prior to the release and I'm busy with
    release stuff right now).

    I would also be interested in a tcpdump of the NFS traffic on the
    client side that catches it in the act (one should see a
    retransmitted nfs request in the dump at the same time the
    kernel logs the responding/alive-again warning).

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


:Howdy!
:
:	I have a dedicated NFS server with 16 nfsd's running, connected
:to SCSI raid/softupdates and good network connectivity/switching. Under
:moderate or even sometimes light load the clients(7 of them) log messages
:
:     nfs server 192.168.10.2:/mnt: not responding
:     nfs server 192.168.10.2:/mnt: is alive again
:
: several times per minute. They always have the same timestamp. Performance
:is not noticably impaired, but I am wondering if this situation will eventually
:become a performance barrier as the system ramps up to full production, if
:the above log messages mean that packets must be delayed or retransmitted.
:
:	I have experimented with both udp/tcp mounts with various rw sizes
:ranging from 8192 - 32768. tcp with rw=32768 was the worst case wrt above, and
:am currently using udp mounts with default rw. Running 8 or 16 nfsd gives no
:noticable difference wrt the above either.
:
:	I can find no issues on the server side, and was wondering if
:there is a timeout threshold on the client side that is triggering the
:messages/condition and whether adjusting the compile time tunables in
:sys/nfs/nfs.h can aleviate the problem(if it even is a problem).
:
:	The server is 4.5-PRE, the clients are 4.4-RELEASE. I know
:that there have been a log of NFS changes in 4.5, but testing 4.5 on
:the clients is problematic. Do any of those changes affect the above?
:
:	So which variable(s) in nfs.h affect the client side and contribute
:to the server timeout situation. Or are there any other suggestions?
:
:	Here are the variables in nfs.h(4.5-RC) for your amusement. Thanks
:
:/*
: * Tunable constants for nfs
: */
:
:#define	NFS_MAXIOVEC	34
:#define NFS_TICKINTVL	5		/* Desired time for a tick (msec) */
:#define NFS_HZ		(hz / nfs_ticks) /* Ticks/sec */
:#define	NFS_TIMEO	(1 * NFS_HZ)	/* Default timeout = 1 second */
:#define	NFS_MINTIMEO	(1 * NFS_HZ)	/* Min timeout to use */
:#define	NFS_MAXTIMEO	(60 * NFS_HZ)	/* Max timeout to backoff to */
:#define	NFS_MINIDEMTIMEO (5 * NFS_HZ)	/* Min timeout for non-idempotent ops*/
:#define	NFS_MAXREXMIT	100		/* Stop counting after this many */
:#define	NFS_MAXWINDOW	1024		/* Max number of outstanding requests */
:#define	NFS_RETRANS	10		/* Num of retrans for soft mounts */
:#define	NFS_MAXGRPS	16		/* Max. size of groups list */
:#ifndef NFS_MINATTRTIMO
:#define	NFS_MINATTRTIMO 3		/* VREG attrib cache timeout in sec */
:#endif
:#ifndef NFS_MAXATTRTIMO
:#define	NFS_MAXATTRTIMO 60
:#endif
:#ifndef NFS_MINDIRATTRTIMO
:#define	NFS_MINDIRATTRTIMO 30		/* VDIR attrib cache timeout in sec */
:#endif
:#ifndef NFS_MAXDIRATTRTIMO
:#define	NFS_MAXDIRATTRTIMO 60
:#endif
:#define	NFS_WSIZE	8192		/* Def. write data size <= 8192 */
:#define	NFS_RSIZE	8192		/* Def. read data size <= 8192 */
:#define NFS_READDIRSIZE	8192		/* Def. readdir size */
:#define	NFS_DEFRAHEAD	1		/* Def. read ahead # blocks */
:#define	NFS_MAXRAHEAD	4		/* Max. read ahead # blocks */
:#define	NFS_MAXUIDHASH	64		/* Max. # of hashed uid entries/mp */
:#define	NFS_MAXASYNCDAEMON 	20	/* Max. number async_daemons runnable */
:#define NFS_MAXGATHERDELAY	100	/* Max. write gather delay (msec) */
:#ifndef NFS_GATHERDELAY
:#define NFS_GATHERDELAY		10	/* Default write gather delay (msec) */
:#endif
:#define	NFS_DIRBLKSIZ	4096		/* Must be a multiple of DIRBLKSIZ */
:#ifdef _KERNEL
:#define	DIRBLKSIZ	512		/* XXX we used to use ufs's DIRBLKSIZ */
:#endif

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message