Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Jan 2004 10:34:15 -0600
From:      Eric Anderson <anderson@centtech.com>
To:        Steve Francis <steve@expertcity.com>
Cc:        Willem Jan Withagen <wjw@withagen.nl>
Subject:   Re: Old SUN NFS performance papers.
Message-ID:  <400C0707.7050805@centtech.com>
In-Reply-To: <400C039B.6080403@expertcity.com>
References:  <003c01c3de8d$d569edb0$471b3dd4@dual> <400BE749.2030009@centtech.com> <400C039B.6080403@expertcity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Steve Francis wrote:

> Benchmarking seems like the best thing to do, however I have some info 
> I've collected from prior posts:
>
> These are from the thread "Slow disk write speeds over network" on 
> freebsd-performance@freebsd.org, all written by Terry Lambert.
>
> you should definitely use TCP with FreeBSD NFS servers; it's
> also just generally a good idea, since UDP frags act as a fixed
> non-sliding window: NFS over UDP sucks.
>
> Also, you haven't said whether you are using aliases on your
> network cards; aliases and NFS tend to interact badly.
>
> Finally, you probably want to tweak some sysctl's, e.g.
>
>     net.inet.ip.check_interface=0
>     net.inet.tcp.inflight_enable=1
>     net.inet.tcp.inflight_debug=0
>     net.inet.tcp.msl=3000
>     net.inet.tcp.inflight_min=6100
>     net.isr.enable=1
>
> Given your overloading of your bus, that last one is probably
> the most important one: it enables direct dispatch.
>
> You'll also want to enable DEVICE_POLLING in your kernel
> config file (assuming you have a good ethernet card whose
> driver supports it):
>
>     options DEVICE_POLLING
>     options HZ=2000
>
>
> ...and yet more sysctl's for this:
>
>     kern.polling.enable=1
>     kern.polling.user_frac=50    # 0..100; whatever works best
>
> If you've got a really terrible Gigabit Ethernet card, then
> you may be copying all your packets over again (e.g. m_pullup()),
> and that could be eating your bus, too.
>
>
>> Huh.  I thought that the conventional wisdom was that on a local network
>> with no packet loss (and therefore no re-transmission penalties), udp 
>> was
>> way faster because the overhead was so much less.
>>
>> Sorry if this seems like a pretty basic question, but can you explain
>> this?
>
>
> Sure:
>
> 1)    There is no such thing as no packet loss.
>
> 2)    The UDP packets are reassembled in a reassembly queue
>     on the receiver.  While this is happening, you can only
>     have one datagram outstanding at a time.  With TCP, you
>     get a sliding window; with UDP, you stall waiting for
>     the reassembly, effectively giving you a non-sliding
>     window (request/response, with round trip latencies per
>     packet, instead of two of them amortized across a 100M
>     file transfer).
>
> 3)    When a packet is lost, the UDP retransmit code is rather
>     crufty.  It resends the whole series of packets, and you
>     eat the overhead for that.  TCP, on the other hand, can
>     do selective acknowledgement, or, if it's not supported
>     by both ends, it can at least acknowledge the packets
>     that did get through, saving you a retransmit.
>
> 4)    FreeBSD's UDP fragment reassembly buffer code is well
>     known to pretty much suck.  This is true of most UDP
>     fragment reassembly code in the universe, however, and
>     is not that specific to FreeBSD.  So sending UDP packets
>     that get fragged because they're larger than your MTU is
>     not a very clever way of achieving a fixed window size
>     larger than the MTU (see also #2, above, for why you do
>     not want to used an effectively fixed window protocol
>     anyway).
>
> Even if there were no packet loss at all with UDP, unless all
> your data is around the size of one rsize/wsize/packet, the
> combined RTT overhead for even a moderately large number of
> packets in a single run is enough to trigger the amortized cost
> of the additional TCP overhead being lower than the UDP overhead
> from the latency.  Depending on your hardware (switch latency,
> half duplex, etc.), you could also be talking about a significant
> combined bandwidth delay product.
>
> Now add to all this that you have to send explicit ACKs with UDP,
> while you can use piggy-back ACKs on the return payloads for TCP.
>
> I think the idea that UDP was OK for nearly-lossless short-haul
> came about from people who couldn't code a working TCP NFS client.
> .
>
>
> Eric Anderson wrote:
>
>> Willem Jan Withagen wrote:
>>
>>> Hi,
>>>
>>> I had no responses to my recent question on the difference between 
>>> NFS over UDP
>>> and TCP. So perhaps nobody cares??
>>>
>>> So I tried searching but have not found much yet.
>>> Does anybody know where to find the white papers SUN once wrote 
>>> about tuning
>>> NFS??? They should be at sun, but where??
>>> All other suggestions to read are welcomed as well.
>>>
>>> Given my last posting I'm building two machines to do some NFS 
>>> benchmark testing
>>> on.
>>> Suggestions on what people "always wanted to know (tm)" are also 
>>> welcom, and
>>> I'll see if I can get them integrated.
>>> I've found the benchmarks in /usr/ports, some might do so nice work 
>>> as well.
>>>
>>> If people are interested I'll keep them posted in performance@
>>>
>>
>> I'm definitely interested in what you find.  I run a few heavily used 
>> FreeBSD NFS servers, and therefore always looking for tweaks and nobs 
>> to turn to make things better.  In my experience, UDP has always been 
>> faster than TCP on NFS performance.   Prior to 5.2, I have also seen 
>> mbuf related issues (all pretty much solvable with the right sysctl's).
>> Let me know if I can help.
>>
>> Eric
>>
>>
>

I wasn't even sure where to start or stop snipping on this mail, since 
it is all good stuff - so I didn't. :)  Thanks for the great info, and 
good explanations..  NFS+TCP is very nice, but I do believe the UDP 
transport was faster on a handful of tests (however I typically force 
use of TCP when I can)..

One question - what does net.inet.ip.check_interface=0  do?

Eric



-- 
------------------------------------------------------------------
Eric Anderson	   Systems Administrator      Centaur Technology
All generalizations are false, including this one.
------------------------------------------------------------------




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?400C0707.7050805>