Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 May 2002 15:45:10 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Attila Nagy <bra@fsn.hu>
Cc:        freebsd-arch@freebsd.org, freebsd-net@freebsd.org
Subject:   Re: HEADS UP: ALTQ integration developer preview
Message-ID:  <3CE6D976.3264DE53@mindspring.com>
References:  <Pine.BSF.4.10.10205170216500.29826-100000@ady.warpnet.ro>  <Pine.LNX.4.44.0205171056200.2091-100000@scribble.fsn.hu>  <3CE55A9B.73EA3DE4@mindspring.com> <Pine.LNX.4.44.0205181018300.10011-100000@scribble.fsn.hu> <3CE61675.BCE2A9E1@mindspring.com> <Pine.LNX.4.44.0205181336380.10862-100000@scribble.fsn.hu>

next in thread | previous in thread | raw e-mail | index | archive | help
Attila Nagy wrote:
> > Sending datagrams bigger than the MTU is a bad idea.
> 
> It depends on what do you want to do with that NFS server :)

Sure.  Maybe you want to use up it's mbufs by jamming the frag
reassembly queue for IP full of N-1 frags using 64K USP packets.


> I want to get out from that several hundred megabits per second, so I
> can't use <1500 bytes MTU.

NFS works over TCP for a reason.  TCP has sliding windows for a
reason.


> Just for comparison:
> when using <1500 bytes MTU (as close as possible to the limit) I can
> achieve about 1-1.5 MBps throughput.
> When using 32768 bytes MTU I can get around 190 Mbps out from a PIII 450.
> (and only 190 Mbps because the two frontends have fast ethernet cards)
> So why this is so bad? If the other end can keep up, it will increase
> throughput.

And you could get even better by getting rid of the request/response
turnaround stall by using TCP instead of UDP.  Rather than having a
packet overhead of latency per packet, you get two packet latencies
over an arbitrary number of packets (unless you hit the window size,
in which case, you probably needed to have a larger window.

> BTW, the default UDP readsize is above 1500 bytes, so I couldn't use the
> server with a simple NFS mount.
> When I replaced the gx driver to em it just started to work. Now I am
> using TCP and 32k readsize with 64k tcp.sendspace and recvspace (I could
> nearly double the performance with setting these from the default values).
> 
> So I am happy with it, I just took a note that the gx driver has some
> problems in these cases.

Most traffic is supposed to be at the MTU.  You want to avoid
fragging.  The only reason you want fragging in the UDP case is
so you can pretend you have a window, without having to use a
windowing protocol: you use the fragment reassembly queue as a
window buffer.

This really only gives you an amortization of 32K/MTU (maximum),
and you still have stalls every N packets, which you would not
get with a windowed protocol.


> > I would be real tempted to drop the packets and send "don't fragment"
> > ICMP responses to beat up anyone who abused UDP by sending larger than
> > the MTU.
> 
> That's another point. I want performance :)

Then don't add the fragment reassmbly code to the code path for
packets you send to the server.  That way you'll have less overhead.

8-).


> > I guess this is about Linux UDP NFS clients, in particular.
> 
> Nope, both the server and the client side is FreeBSD stable.

I run all my NFS over TCP.  If I avoid intentionally triggering
fragmentation, it works out to a little over 100 machine
instructions in the fast path.  Done any cycle counting on your
use of UDP yet?

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CE6D976.3264DE53>