Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Jul 2001 12:00:04 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Leo Bicknell <bicknell@ufp.org>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Network performance roadmap.
Message-ID:  <3B4F4534.37D8FC3E@mindspring.com>
References:  <20010713101107.B9559@ussenterprise.ufp.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Leo Bicknell wrote:
> 1) FreeBSD's TCP windows cannot grow large enough to allow for
>    optimum performance.  The primary obstical to raising them is
>    that if you do so, the system can run out of MBUF's.  Schemes
>    need to be put in place to limit MBUF usage, and better allocate
>    buffers per connection.

Not quite true.  They are administratively limited, because of
the artificial fixed ratio of mbufs to clusters.  This is a
design problem, not a physical limitation.


> 2) Windows are currently 16k.  It seems a wide number of people
>    think 32k would not cause major issues, and is in fact in use
>    by many other OS's at this time.

The main reason for this is that other OS's use system buffers
for jumbograms.  Please check the Tigon II and Intel Gigabit
drivers, and you will see that FreeBSD does not do this.  Jumbo
buffers are seperate.  People having performance issues with
Jumbo should consider setting their MTU to 8k instead of 9k, to
cause them to become an even multiple of mbuf size.

> There are a few other observations that have been made that are
> important.
> 
> A) The receive buffers are hardly used.  In fact, data generally
>    only sits in a receive buffer for one of two reasons.  First,
>    the data has not yet been passed to the application.  This amount of
>    data is generally very small.  Second, data for unacknowledged
>    segments will sit in the buffer waiting for a retransmit.  It is of
>    course possible that the buffers could be completely full from either
>    case, but several research papers indicate that receive buffers
>    rarely use much space at all.

You need to read the WRL and Rice University papers, then,
and pay particular attention to "livelock".


> B) When the system runs out of MBUF's, really bad things happen.  It
>    would be nice to make the system handle MBUF exhaustion in a nicer
>    way, or avoid it.

The easiest way to do this is to know ahead of time how many
you _really_ have.  Then bad things don't happen.


> C) Many people think TCP_EXTENSIONS="YES" gives them windows > 64k.
>    It does, in the sense that it allows the window scale option, but
>    it doesn't in that socket buffers aren't changed.

Socket buffers are set at boot time.  Read the code.  Same for
maximum number of connections: you can hop around until you
are blue in the face from typing "sysctl", but it will not
change the number of tcpcb's and inpcb's, etc..  This is an
artifact of the allocator.

> >From all of this, I propose the following short term road map:
> 
> a - Commit higher socket buffer sizes:
> 
>     -current:  64k receive  (based on observation A)
>                32k send     (based on detail 2)
> 
>     -stable:   32k receive  (based on detail 2)
>                32k send     (based on detail 2)
> 
>     I think this can be done more or less immediately.

This would suck.  It would halve your maximum number of
concurrent connections on servers with differential rates
on the connections (e.g. my local connection is 1Gbit, but
the other end is on a 28K modem).  Your send windows will
always remain full.

Having larger transmit windows is really dependent on the
type of traffic you expect to serve; in the HTTP case, the
studies indicate that the majority of objects served are
less than 8k in size.  Most browsers (except Opera) do
not suport PIPELINING.

You would be well served to do this on a test system, and
do watermark connection counting on a lot of traffic (e.g.
how many connections get to 1k, 2k, 4k, 8k, 16k, 32k of
data buffered in the send window; do the count when the
connection closes, based on the high watermark -- you can
put it in the socket struct, which is bloated to 192 bytes
by the allocator's 64 byte alignment property anyway, so
you have some headroom for keeping these stats).

Only after you have proven that some significant fraction
of traffic actually ends up hitting the window size limits,
should you make this change to FreeBSD proper.

If anyone is interested in doing this and writing a paper,
you can probably build a nice Master's Thesis on the study
as a fast-track to getting your Master's, since it would
probably take less than two weeks to do the whole thing,
and most of that would be in waiting for the traffic data
to get collected.


> b - Allow larger receive windows in some cases.  In -current
>     only, if TCP_EXTENSIONS="YES" is configured (turn on RFC1323
>     extensions) change the settings to:
> 
>     1M kernel limit  (based on observation C)
>     256k receive     (based on observation A, C)
>     64k send         (based on observation C)
> 
>     Note, 64k send is most likely agressive with the current MBUF
>     problems.  Some later points will address that.  For now, the
>     basic assumption is that people configuring TCP_EXTENSIONS are
>     clueful people with larger memory machines who also tune things like
>     MAXUSERS up, so they will probably be ok.

You can bump the default max, but should not bump the
default itself, unless it is requested by a program
(e.g. maintain a soft limit based on socket options
for experimental programs).

I think you will find that it's a bad idea.

> c - Prevent MBUF exhaustion.  Today, when you run out of MBUF's, bad
>     things start to happen.  It would be nice to prevent that from
>     happening, and also to provide sysadmins some warning when it is
>     about to happen.

One good way to prevent this is to not unreasonably set
your window size... 8-p.


>     This change sounds easy, but I don't know where in the code to start
>     looking.  Basically, there is a bit of code somewhere that decides
>     if a sending TCP process should block or not.  Today this code only
>     looks to see if that socket's TCP send buffer is full.  What I
>     propose is that it should also check if less than 10% of the MBUF's
>     are free, and if so also block the sender.

Ugh.  If you don't know where to start looking in the code,
this is definitely research that should not be done in the
context of committing changes to the main FreeBSD tree,
until you have your answers on whether the changes actually
improve or degrade performance.


I think you need to do a seperate research project; start
with the SCALA Server papers, and the WRL papers as references.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B4F4534.37D8FC3E>