FreeBSD Mail Archives

Date:      Wed, 11 Jul 2001 19:50:21 -0400
From:      Leo Bicknell <bicknell@ufp.org>
To:        freebsd-hackers@freebsd.org
Subject:   Network performance tuning.
Message-ID:  <20010711195021.A89324@ussenterprise.ufp.org>

next in thread | raw e-mail | index | archive | help


I'm going to bring up a topic that is sure to spark a great debate
(read: flamefest), but I think it's an important issue.  I've put
my nomex on, let's see where this goes.

I work for an international ISP.  One of the customer complaints
that has been on the rise is poor transfer rates across our network.
When these come up, I'll often get called in to investigate.  Over
the past 2-3 years there has been an alarming increase in these
complaints, and what disturbs me more is there is a simple solution
99% of the time - increase the TCP window size.

Admittedly, my environment is a bit rare.  This generally comes
from colo customers who have to 100Mbps connected beefy servers on
opposite coasts and can't understand why around 100k/sec is the
best transfer rate they can get.  If only we all had uncongested
100Mbps connections!  Anyway, after having them up the window size
on their machines, we can, if necessary, get them up to full 100Mbps
across the country (I have logs of 9.98MB/sec FTP's coast to coast,
if anyone wants them).

So, I decided it was time to pick on FreeBSD.  There are a number
of reasons, chief among them is that virtually all other OS's now
have larger default window sizes (and thus offer better performance)
than FreeBSD out of the box.  A secondary reason is that there are
for the first time real end users, in the form of cable modem
subscribers being hit by this same issue.

Let's cut to the nitty gritty.  This is all limited by the bandwidth
* delay product, you can ship one window per rtt, and all that.
If you don't understand this already go read about TCP then come
back to this message. :-)  FreeBSD's current default is 16384 bytes
for the window, giving us the following limits on performance:

Lan                    1ms rtt =  15 MB/sec
Coast to Coast        65ms rtt = 246 KB/sec
Coast to Coast        85ms rtt = 188 KB/sec
East Coast to Japan  155ms rtt = 103 KB/sec
London to Japan      225ms rtt =  71 KB/sec
T1 Satellite Link    500ms rtt =  32 KB/sec

So, inside the US, the current window, 16k, lets a single connection
just fill a T1, more or less.  Note, these numbers assume optimal
conditions, the you may see a degradation of up to 50% from those
numbers when bandwidth is available, but there is high jitter, or
packets are reordered.

I wonder how many people are discontinuing DirectPC service because
they can't get over 32 KB/sec downloads from their "T1 speed"
satellite service.

One of the first responses I often get to this issue is "so what,
system administrators can increase the values".  This is true,
however I think it's time to address the defaults.  There are a
number of reasons for this:

* BOTH ends of a TCP connection must be increased.  All the server
  admins in the world can do this, but if end users don't it is
  useless.  Conversely, end users who do this now won't see a speed
  up unless all the server admins change the settings.

* FreeBSD is at the middle-bottom of the pack when it comes to
  defaults.  http://www.psc.edu/networking/perf_tune.html

* Users are slowly getting faster connections (T1 DSL, T1 Satellite,
  10 Mbps cable modems) that need larger values.

* The methods to get around this limit from a users point of view
  is to write custom apps that up the values using the socket calls.
  Hard coding window sizes into apps is a poor solution.

Unfortunately this is where things get really interesting.  If you
want to say, support a 100Mbps transfer over a single TCP connection
you need a buffer around 1 Meg.  That's a lot of buffer.  That
said, most large servers, and even end user workstations could
devote 1 Meg to the network if it ment 100Mbps performance.  Sadly,
this has unintended consequences.

If you did down in the TCP stack, you find a problem.  When a socket
is created in FreeBSD (and I presume many other BSD's as well) it's
buffer limits are set (soreserve).  The behavior today is to set
them to the system default values at socket creation time.  So,
what happens is a dial-up user connects to a web server to download
an MP3 file.  The socket sets aside a 1 Meg buffer, the web server
dumps 1 Meg into it, and then the kernel has to keep that 1 Meg
around in MBUF's until it can dribble out to the end user.  No
surprise, you run out of MBUF's in a hurry.

There are a number of issues that come out of this:

* MBUF's are currently allocated based on NMBCLUSTERS, which is
  based on MAXUSERS (unless overridden).  NMBCLUSTERS is found
  using the formula 512 + MAXUSERS * 16.  This forumla has been in use
  for a long time, and it may be time to consider allocating a few
  more clusters per user.  MBUF's is 4 * NMBCLUSTERS, which is a fine
  number, but testing shows gives you too many MBUF's in many cases.
  (Or, put another way, most every system I've seen shows a trend
  of running out of clusters way before MBUF's.)

* The socket layer needs to be more intelligent about its buffering.
  Simply always allocating the largest buffer is easy to code, but
  wastes considerable resources, particular on machines with lots
  of connections.

So, I'd like to propose some fixes to get people thinking.  I have
ordered them in the order I think they should be done:

1) The per-socket defaults should be raised to 32k in the next
   release, giving 2x today's performance in general, and putting
   FreeBSD on par at least with most Linux distro's.  I think the
   memory consequences here are quite minor, and provide a good
   place to study the effects on real world people.

2) The socket layer needs to be modified to not use the maximum
   buffer as the default.  Imagine if disk drivers allocated 4 Meg
   for every process writing to disk, just because the disk has a
   4 Meg cache.  The buffer clearly needs to hold all unacknowledged
   data, and should therefor grow as the window size grows, plus
   some overhead so that some unsent data can be buffered in the
   kernel (to avoid context switches and the like).  This way
   connections to slow hosts (eg dial up users) would not buffer
   much more than the window size, using only a small amount of
   memory.  This would allow admins to set the sizes much larger
   without wasting memory on connections that will never use it.

   Note, from looking at soreserve and related code it appears it
   just sets maximums, and that raising it midstream would have no
   ill effects.  (Reducing would.)  So a good first stab might be
   to have a new "initial socket buffer" size passed to soreserve
   when a new socket is created, and if the TCP window could be
   increased past that value at any point it could be recalled (or
   a resize function created) that raised the limit to 2 * maxwin,
   or 1.1 * maxwin, or maxwin + buffer or whatever is appropriate
   up to the hard limit set by the system administrator.

3) The number of MBUF's needs to be increased.  Ideally this should
   be dynamically changeable, which it is not today.  As the net
   gets faster, users need more network resources per user, hence
   more MBUF's.  Also, I wonder if it should be determined from
   MAXUSERS at all.  It is in fact related the the maximum number
   of simultaneous network connections, and it might make more
   sense to base it off that, with a default based on MAXUSERS (but
   larger).

Point #2 is very critical.  Right now it means someone who runs a
web server must leave the values fairly low (probably ok for serving
dial up and DSL users) to not run out of MBUF's, but without much
hackery can't get high speed transfers on the nightly backup run,
or content distribution run across the network.  Buffers need to
be more dynamically scaled to individual connections.

So, bottom line, in the end I would like a FreeBSD host that out
of the box can get 2-4 MBytes/sec across country (or better), but
that manages it in such a way that your standard web server running
on a FreeBSD box doesn't fall over.  Is it just a pipe dream, or
can we make that happen with a little effort?

-- 
Leo Bicknell - bicknell@ufp.org
Systems Engineer - Internetworking Engineer - CCIE 3440
Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010711195021.A89324>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation