From owner-freebsd-performance@FreeBSD.ORG Mon Jan 19 08:34:48 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D1B9D16A4CE for ; Mon, 19 Jan 2004 08:34:48 -0800 (PST) Received: from otter3.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 42DF843D31 for ; Mon, 19 Jan 2004 08:34:43 -0800 (PST) (envelope-from anderson@centtech.com) Received: from centtech.com (neutrino.centtech.com [10.177.171.220]) by otter3.centtech.com (8.12.3/8.12.3) with ESMTP id i0JGYe6T019983; Mon, 19 Jan 2004 10:34:41 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <400C0707.7050805@centtech.com> Date: Mon, 19 Jan 2004 10:34:15 -0600 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Steve Francis References: <003c01c3de8d$d569edb0$471b3dd4@dual> <400BE749.2030009@centtech.com> <400C039B.6080403@expertcity.com> In-Reply-To: <400C039B.6080403@expertcity.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: performance@freebsd.org cc: Willem Jan Withagen Subject: Re: Old SUN NFS performance papers. X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jan 2004 16:34:48 -0000 Steve Francis wrote: > Benchmarking seems like the best thing to do, however I have some info > I've collected from prior posts: > > These are from the thread "Slow disk write speeds over network" on > freebsd-performance@freebsd.org, all written by Terry Lambert. > > you should definitely use TCP with FreeBSD NFS servers; it's > also just generally a good idea, since UDP frags act as a fixed > non-sliding window: NFS over UDP sucks. > > Also, you haven't said whether you are using aliases on your > network cards; aliases and NFS tend to interact badly. > > Finally, you probably want to tweak some sysctl's, e.g. > > net.inet.ip.check_interface=0 > net.inet.tcp.inflight_enable=1 > net.inet.tcp.inflight_debug=0 > net.inet.tcp.msl=3000 > net.inet.tcp.inflight_min=6100 > net.isr.enable=1 > > Given your overloading of your bus, that last one is probably > the most important one: it enables direct dispatch. > > You'll also want to enable DEVICE_POLLING in your kernel > config file (assuming you have a good ethernet card whose > driver supports it): > > options DEVICE_POLLING > options HZ=2000 > > > ...and yet more sysctl's for this: > > kern.polling.enable=1 > kern.polling.user_frac=50 # 0..100; whatever works best > > If you've got a really terrible Gigabit Ethernet card, then > you may be copying all your packets over again (e.g. m_pullup()), > and that could be eating your bus, too. > > >> Huh. I thought that the conventional wisdom was that on a local network >> with no packet loss (and therefore no re-transmission penalties), udp >> was >> way faster because the overhead was so much less. >> >> Sorry if this seems like a pretty basic question, but can you explain >> this? > > > Sure: > > 1) There is no such thing as no packet loss. > > 2) The UDP packets are reassembled in a reassembly queue > on the receiver. While this is happening, you can only > have one datagram outstanding at a time. With TCP, you > get a sliding window; with UDP, you stall waiting for > the reassembly, effectively giving you a non-sliding > window (request/response, with round trip latencies per > packet, instead of two of them amortized across a 100M > file transfer). > > 3) When a packet is lost, the UDP retransmit code is rather > crufty. It resends the whole series of packets, and you > eat the overhead for that. TCP, on the other hand, can > do selective acknowledgement, or, if it's not supported > by both ends, it can at least acknowledge the packets > that did get through, saving you a retransmit. > > 4) FreeBSD's UDP fragment reassembly buffer code is well > known to pretty much suck. This is true of most UDP > fragment reassembly code in the universe, however, and > is not that specific to FreeBSD. So sending UDP packets > that get fragged because they're larger than your MTU is > not a very clever way of achieving a fixed window size > larger than the MTU (see also #2, above, for why you do > not want to used an effectively fixed window protocol > anyway). > > Even if there were no packet loss at all with UDP, unless all > your data is around the size of one rsize/wsize/packet, the > combined RTT overhead for even a moderately large number of > packets in a single run is enough to trigger the amortized cost > of the additional TCP overhead being lower than the UDP overhead > from the latency. Depending on your hardware (switch latency, > half duplex, etc.), you could also be talking about a significant > combined bandwidth delay product. > > Now add to all this that you have to send explicit ACKs with UDP, > while you can use piggy-back ACKs on the return payloads for TCP. > > I think the idea that UDP was OK for nearly-lossless short-haul > came about from people who couldn't code a working TCP NFS client. > . > > > Eric Anderson wrote: > >> Willem Jan Withagen wrote: >> >>> Hi, >>> >>> I had no responses to my recent question on the difference between >>> NFS over UDP >>> and TCP. So perhaps nobody cares?? >>> >>> So I tried searching but have not found much yet. >>> Does anybody know where to find the white papers SUN once wrote >>> about tuning >>> NFS??? They should be at sun, but where?? >>> All other suggestions to read are welcomed as well. >>> >>> Given my last posting I'm building two machines to do some NFS >>> benchmark testing >>> on. >>> Suggestions on what people "always wanted to know (tm)" are also >>> welcom, and >>> I'll see if I can get them integrated. >>> I've found the benchmarks in /usr/ports, some might do so nice work >>> as well. >>> >>> If people are interested I'll keep them posted in performance@ >>> >> >> I'm definitely interested in what you find. I run a few heavily used >> FreeBSD NFS servers, and therefore always looking for tweaks and nobs >> to turn to make things better. In my experience, UDP has always been >> faster than TCP on NFS performance. Prior to 5.2, I have also seen >> mbuf related issues (all pretty much solvable with the right sysctl's). >> Let me know if I can help. >> >> Eric >> >> > I wasn't even sure where to start or stop snipping on this mail, since it is all good stuff - so I didn't. :) Thanks for the great info, and good explanations.. NFS+TCP is very nice, but I do believe the UDP transport was faster on a handful of tests (however I typically force use of TCP when I can).. One question - what does net.inet.ip.check_interface=0 do? Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------