From owner-freebsd-current  Sun Mar 16 15:50: 7 2003
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8214E37B401
	for <freebsd-current@freebsd.org>; Sun, 16 Mar 2003 15:50:04 -0800 (PST)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CE29C43F93
	for <freebsd-current@freebsd.org>; Sun, 16 Mar 2003 15:50:03 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0507.cvx22-bradley.dialup.earthlink.net ([209.179.199.252] helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18uht9-0004CQ-00; Sun, 16 Mar 2003 15:50:00 -0800
Message-ID: <3E750D52.FFA28DA2@mindspring.com>
Date: Sun, 16 Mar 2003 15:48:34 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Petri Helenius <pete@he.iki.fi>
Cc: freebsd-current@FreeBSD.ORG
Subject: Re: mbuf cache
References: <0ded01c2e295$cbef0940$932a40c1@PHE> <20030304164449.A10136@unixdaemons.com> <0e1b01c2e29c$d1fefdc0$932a40c1@PHE> <20030304173809.A10373@unixdaemons.com> <0e2b01c2e2a3$96fd3b40$932a40c1@PHE> <20030304182133.A10561@unixdaemons.com> <0e3701c2e2a7$aaa2b180$932a40c1@PHE> <20030304190851.A10853@unixdaemons.com> <001201c2e2ee$54eedfb0$932a40c1@PHE> <20030307093736.A18611@unixdaemons.com> <008101c2e4ba$53d875a0$932a40c1@PHE> <3E68ECBF.E7648DE8@mindspring.com> <3E70813B.7040504@he.iki.fi>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a31c97f95b20bec9c98888d852147212548b785378294e88350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Petri Helenius wrote:
> Terry Lambert wrote:
> >Ah.  You are receiver livelocked.  Try enabling polling; it will
> >help up to the first stall barrier (NETISR not getting a chance
> >to run protocol processing to completion because of interrupt
> >overhead); there are two other stall barriers after that, and
> >another in user space is possible depending on whether the
> >application layer is request/response.
> 
> Are you sure that polling would help even since the em driver is using
> interrupt regulation by default?

You mean hardware interrupt coelescing, not regulation.  Regulation
is where you prevent the card from generating interrupts during a
livelock situation, to permit the host to process the data it already
has in the pipeline.

It will help some.  Instead of livelocking by the interrupt load
causing NETISR to never run, it will livelock where NETISR attemps
to push data to user space, which is never read by the user space
process, because the user space process never gets to run, since
interrupts, and now, NETISR processing, are taking all the CPU
time.

You can get to this same point in -CURRENT, if you are using up to
date sources, by enabling direct dispatch, which disables NETISR.
This will help somewhat more than polling, since it will remove the
normal timer latency between receipt of a packet, and processing of
the packet through the networks stack.  This should reduce overall
pool retention time for individual mbufs that don't end up on a
socket so_rcv queue.  Because interrupts on the card are not
acknowledged until the code runs to completion, this also tends to
requlate interupt load.

This also has the desirable side effect that stack processing will
occur on the same CPU as the interrupt processing occurred.  This
avoids inter-CPU memory bus arbitration cycles, and ensures that
you won't engage in a lot of unnecessary L1 cache busting.  Hence
I prefer this method to polling.


> It might solve the livelock but it does
> probably not increase the performance of the mbuf allocator?

No, it does not increase the performance of the mbuf allocator.

The main problem with the mbuf allocator as it stands today is
that there is a tradeoff between how fast you can make it, and
whether or not it's SMP safe.

There is a researcher at the University of Kentucky, who I have
explained a number of obscure details of the VM system to, who
has implemented a freelist allocator, and gotten a 5 times
performance increase on his TCP stack.  I'm not sure if he'd be
willing to share his research with you or anyone else, but if
you read back over my own postings regarding mbuf allocators,
you should be able to repeat the code developement that he has
done.  Note that his allocator is not SMP safe, and it's probably
antithetical to the idea, at all.

Personally, I'm coming to the conclusion that SMP systems should
be treated as NUMA machines, and seperately allocated resources,
and, potentially, even OS images.  Until the memory and I/O bus
speeds catch up with the CPU speeds again, the cost of resource
contention stalls is so incredibly high because of the speed
multipliers as to make it not really worth running SMP systems.
You will get much better load capacity scaling out of two cheaper
boxes, if you implement correctly, IMO.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message