From owner-freebsd-hackers  Fri Jul 13 13:27:49 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from ussenterprise.ufp.org (ussenterprise.ufp.org [208.185.30.210])
	by hub.freebsd.org (Postfix) with ESMTP id 4B14137B403
	for <freebsd-hackers@FreeBSD.ORG>; Fri, 13 Jul 2001 13:27:44 -0700 (PDT)
	(envelope-from bicknell@ussenterprise.ufp.org)
Received: (from bicknell@localhost)
	by ussenterprise.ufp.org (8.11.1/8.11.1) id f6DKRga32864;
	Fri, 13 Jul 2001 16:27:42 -0400 (EDT)
	(envelope-from bicknell)
Date: Fri, 13 Jul 2001 16:27:42 -0400
From: Leo Bicknell <bicknell@ufp.org>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Leo Bicknell <bicknell@ufp.org>, freebsd-hackers@FreeBSD.ORG
Subject: Re: Network performance roadmap.
Message-ID: <20010713162742.A31883@ussenterprise.ufp.org>
Mail-Followup-To: Leo Bicknell <bicknell@ussenterprise.ufp.org>,
	Terry Lambert <tlambert2@mindspring.com>,
	Leo Bicknell <bicknell@ufp.org>, freebsd-hackers@FreeBSD.ORG
References: <20010713101107.B9559@ussenterprise.ufp.org> <3B4F4534.37D8FC3E@mindspring.com> <20010713151257.A27664@ussenterprise.ufp.org> <3B4F542F.D0D0E0BA@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3B4F542F.D0D0E0BA@mindspring.com>; from tlambert2@mindspring.com on Fri, Jul 13, 2001 at 01:03:59PM -0700
Organization: United Federation of Planets
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

On Fri, Jul 13, 2001 at 01:03:59PM -0700, Terry Lambert wrote:
> When I run out of mbufs, I do not have "bad things happen".
> 
> The "bad things" are an artifact of memory overcommit; if you

One thing is clear is that we're talking about two different
level of problems.

On the systems I run (admitedly with < 1,000,000 connections)
there is no memory overcommit, and the bad things from running
out of mbuf's aren't running out of memory.

The problem I would like to solve is the overbuffering of
individual sockets.  That's not to say there aren't other
issues involving allocating more MBUF's, or doing other things.
I think there is enough evidence from papers like the ones
at psc.edu to show that solving the overbuffering problem
one way or another provides a huge increase in performance
across a wide range of conditions.

> By changing them.  I have servers that can support 1,000,000
> concurrent connections.  They are based on FreeBSD running on
> 4GB memory systems with 2 1Gbit NICs.
> 
> This is why all the hand-waving and suggestions for substantial
> (and unnecessary, from empirical practice) changes in the
> FreeBSD stack is making me so leery.

Well, you can consider it hand waving.  That said, on the low
end you can easily prove mathematically, and demonstrate
empirically that end users (of the high speed DSL and cable
modem variety) are being limited on a day-to-day basis.  While
I won't prevent anyone from fixing larger and more looming
issues, I will be satisifed when that is no longer the case.
I think fixing that does not require rewriting the kernel
memory allocator, as you seem to want to suggest.

> That would be nice; first of all, you will need to get
> over your aversion to working on kernel memory allocators
> (;-)), since the only way to set things up for variable
> loads is to take away the fixed nature of the allocations
> which are needed to tune for those loads.  You can't apply

The PSC autotune (which has some other issues) seems to address
a large segment of the variable load problem without taking
the steps you suggest.  It gets far enough to solve the 
problems I am interested in solving.  It also seems to me
that even if someone takes on the project you seem interested
in something along the lines of the autotuning code will be
necessary to take advantage of it.

> parameters.  This is no good.  You need the empirical data,
> but it should not be applied to tuning parameters globally,
> it should be applied to them on a case by case basis on
> server installations.

Today I believe (in very round figures) the defaults are "right"
for 25% of the users, "right enough" for 50% of the users,
and "wrong" for 25% of the installations.  It's the "right enough"
catagory that worries me the most.  I can provide lots of 
evidence, in the form of my employer's customer's experiences
where they were getting "200k/s" across the net and figured
"that's all the Internet can provide".  After a few tweeks
they were getting 1M/sec and their eyes light up with "I never
knew the network could support that".

It's sad the users don't have higher expectations, but at the
same time the fact that the operating system, and not the
network, is limiting their performance is completely hidden.

So, I'd be happy if we can move to 75% right, 25% "wrong"
(note, that's no improvement in the number of wrong cases).

> The only way around this is to bite the bullet, and do the
> right thing.  Failure to do that means that you are subject
> to denial of service attacks based on your tuning parameters,
> so while you may run OK in the case of needing a lot of HTTP
> connections with small windows, someone can panic your system
> by advertising very large windows and then giving you many
> 2MB HTTP requests.  Normal HTTP requests are not that large,

The PSC fair share (note, I do not recomend that's what we use,
but just use it as a reference) would seem to mitigate the DOS
potential with appropriate settings.  The whole point of
this is that the OS should not be buffering 2M per request
because the _LIMIT_ is 2M, it should be buffering 2M because
the window and actual transfer rate suggest that it might be
necessary.

> > Ah, I see, so to prevent MBUF exhaustion I should not let
> > my socket buffers get large.  Sort of like to prevent serious
> > injury in a car crash I should drive at 10MPH on the freeway.
> 
> Or 55MPH.  Or 65MPH.  Whatever your local limit is, is also
> administrative, and quite arbitrary.  Many cars are safe at
> much, much faster speeds, as long as someone doesn't decide
> to drive at 50MPH in the fast lane, so your rate of closure
> is 70MPH+.

Yes, and most cars have speed limiters built in these days,
you'll find they don't kick in until 120-150mph.  FreeBSD
seems to have it's speed limiter set at 25mph because in
the past most users lived in the city where they couldn't
drive fast, and as a result we didn't bother to make the
throttle variable, it was 0mph or 25mph.

It's time for the variable throttle, if nothing else.

-- 
Leo Bicknell - bicknell@ufp.org
Systems Engineer - Internetworking Engineer - CCIE 3440
Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message