From owner-freebsd-stable Fri Jan 18 22: 0:54 2002 Delivered-To: freebsd-stable@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 0B9CF37B404 for ; Fri, 18 Jan 2002 22:00:46 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g0J60em46015; Fri, 18 Jan 2002 22:00:40 -0800 (PST) (envelope-from dillon) Date: Fri, 18 Jan 2002 22:00:40 -0800 (PST) From: Matthew Dillon Message-Id: <200201190600.g0J60em46015@apollo.backplane.com> To: Steve Shorter Cc: freebsd-stable@FreeBSD.ORG Subject: Re: "server not responding" / "is alive again" NFS tunables References: <20020116101212.A610@nomad.lets.net> Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG It's probably the NFS retry code being a little too finicky again. It should be fairly harmless, but you can try mounting the clients with the 'dumbtimer' mount option to get rid of the dynamic retransmit estimator. If you are interested in helping me fix the dynamic retransmit estimator, send me an email sometime after we release 4.5 and I'll give you some simple patch sets to try. (We don't have time to fix it prior to the release and I'm busy with release stuff right now). I would also be interested in a tcpdump of the NFS traffic on the client side that catches it in the act (one should see a retransmitted nfs request in the dump at the same time the kernel logs the responding/alive-again warning). -Matt Matthew Dillon :Howdy! : : I have a dedicated NFS server with 16 nfsd's running, connected :to SCSI raid/softupdates and good network connectivity/switching. Under :moderate or even sometimes light load the clients(7 of them) log messages : : nfs server 192.168.10.2:/mnt: not responding : nfs server 192.168.10.2:/mnt: is alive again : : several times per minute. They always have the same timestamp. Performance :is not noticably impaired, but I am wondering if this situation will eventually :become a performance barrier as the system ramps up to full production, if :the above log messages mean that packets must be delayed or retransmitted. : : I have experimented with both udp/tcp mounts with various rw sizes :ranging from 8192 - 32768. tcp with rw=32768 was the worst case wrt above, and :am currently using udp mounts with default rw. Running 8 or 16 nfsd gives no :noticable difference wrt the above either. : : I can find no issues on the server side, and was wondering if :there is a timeout threshold on the client side that is triggering the :messages/condition and whether adjusting the compile time tunables in :sys/nfs/nfs.h can aleviate the problem(if it even is a problem). : : The server is 4.5-PRE, the clients are 4.4-RELEASE. I know :that there have been a log of NFS changes in 4.5, but testing 4.5 on :the clients is problematic. Do any of those changes affect the above? : : So which variable(s) in nfs.h affect the client side and contribute :to the server timeout situation. Or are there any other suggestions? : : Here are the variables in nfs.h(4.5-RC) for your amusement. Thanks : :/* : * Tunable constants for nfs : */ : :#define NFS_MAXIOVEC 34 :#define NFS_TICKINTVL 5 /* Desired time for a tick (msec) */ :#define NFS_HZ (hz / nfs_ticks) /* Ticks/sec */ :#define NFS_TIMEO (1 * NFS_HZ) /* Default timeout = 1 second */ :#define NFS_MINTIMEO (1 * NFS_HZ) /* Min timeout to use */ :#define NFS_MAXTIMEO (60 * NFS_HZ) /* Max timeout to backoff to */ :#define NFS_MINIDEMTIMEO (5 * NFS_HZ) /* Min timeout for non-idempotent ops*/ :#define NFS_MAXREXMIT 100 /* Stop counting after this many */ :#define NFS_MAXWINDOW 1024 /* Max number of outstanding requests */ :#define NFS_RETRANS 10 /* Num of retrans for soft mounts */ :#define NFS_MAXGRPS 16 /* Max. size of groups list */ :#ifndef NFS_MINATTRTIMO :#define NFS_MINATTRTIMO 3 /* VREG attrib cache timeout in sec */ :#endif :#ifndef NFS_MAXATTRTIMO :#define NFS_MAXATTRTIMO 60 :#endif :#ifndef NFS_MINDIRATTRTIMO :#define NFS_MINDIRATTRTIMO 30 /* VDIR attrib cache timeout in sec */ :#endif :#ifndef NFS_MAXDIRATTRTIMO :#define NFS_MAXDIRATTRTIMO 60 :#endif :#define NFS_WSIZE 8192 /* Def. write data size <= 8192 */ :#define NFS_RSIZE 8192 /* Def. read data size <= 8192 */ :#define NFS_READDIRSIZE 8192 /* Def. readdir size */ :#define NFS_DEFRAHEAD 1 /* Def. read ahead # blocks */ :#define NFS_MAXRAHEAD 4 /* Max. read ahead # blocks */ :#define NFS_MAXUIDHASH 64 /* Max. # of hashed uid entries/mp */ :#define NFS_MAXASYNCDAEMON 20 /* Max. number async_daemons runnable */ :#define NFS_MAXGATHERDELAY 100 /* Max. write gather delay (msec) */ :#ifndef NFS_GATHERDELAY :#define NFS_GATHERDELAY 10 /* Default write gather delay (msec) */ :#endif :#define NFS_DIRBLKSIZ 4096 /* Must be a multiple of DIRBLKSIZ */ :#ifdef _KERNEL :#define DIRBLKSIZ 512 /* XXX we used to use ufs's DIRBLKSIZ */ :#endif To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message