From owner-freebsd-current  Mon Mar 17  0: 8:52 2003
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8A98137B401
	for <freebsd-current@freebsd.org>; Mon, 17 Mar 2003 00:08:49 -0800 (PST)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AA68243FBF
	for <freebsd-current@freebsd.org>; Mon, 17 Mar 2003 00:08:42 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0070.cvx40-bradley.dialup.earthlink.net ([216.244.42.70] helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18upff-0000W1-00; Mon, 17 Mar 2003 00:08:36 -0800
Message-ID: <3E75820D.C7EC28E1@mindspring.com>
Date: Mon, 17 Mar 2003 00:06:37 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Petri Helenius <pete@he.iki.fi>
Cc: freebsd-current@FreeBSD.ORG
Subject: Re: mbuf cache
References: <0ded01c2e295$cbef0940$932a40c1@PHE> <20030304164449.A10136@unixdaemons.com> <0e1b01c2e29c$d1fefdc0$932a40c1@PHE> <20030304173809.A10373@unixdaemons.com> <0e2b01c2e2a3$96fd3b40$932a40c1@PHE> <20030304182133.A10561@unixdaemons.com> <0e3701c2e2a7$aaa2b180$932a40c1@PHE> <20030304190851.A10853@unixdaemons.com> <001201c2e2ee$54eedfb0$932a40c1@PHE> <20030307093736.A18611@unixdaemons.com> <008101c2e4ba$53d875a0$932a40c1@PHE> <3E68ECBF.E7648DE8@mindspring.com> <3E70813B.7040504@he.iki.fi> <3E750D52.FFA28DA2@mindspring.com> <048601c2ec59$0696dd30$932a40c1@PHE>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4f1243bd5605230577196a444ab819537350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Petri Helenius wrote:
> > This also has the desirable side effect that stack processing will
> > occur on the same CPU as the interrupt processing occurred.  This
> > avoids inter-CPU memory bus arbitration cycles, and ensures that
> > you won't engage in a lot of unnecessary L1 cache busting.  Hence
> > I prefer this method to polling.
> 
> Anywhere I could read up on the associated overhead and how the whole
> stuff works out in the worst case where data is DMAd into memory,
> read up to CPU1 and then to CPU2 and then discarded and if there would be
> any roads that can be taken to optimize this.

Not really.  If there were a good resource on this, people would
have read it already, and some of the code that has been rewritten
or replaced would never have been written the way it was in the
first place.  8-).

You can read technical papers on a lot of topics.  Some contain
information that has been "known" to the academic community since
their publication, but has yet to make it into a commercial OS,
let alone a free one like Linux or FreeBSD.

Basically, it's an experience thing.  John Lemon, who did the
direct dispatch work, is a Cisco employee.  He know what he knows
because he has a lot of experience.  Luigi Rizzo, who did the
polling code, is a tenured University professor in Italy.  He
knows what he knows because he has a lot of experience.  I did
the soft interrupt coelescing code, and did a couple of the patches
to add polling support to some of the ethernet drivers, etc.. I'm a
voracious reader of research papers, and I've been a Novell,
Artisoft, Whistle Communications, IBM, ClickArray, etc. employee.
Bill Paul, who did most of the network drivers, did nothing but
eat, breathe, and sleep network drivers for years.  Etc. etc..

If you are asking for paper references, then I can at least tell
you where to start; go to: http://citeseer.nj.nec.com/cs and look
for "Jeff Mogul", "DEC Western Research Laboratories", "Mohit
Aron", "Peter Druschel", "Sally Floyd", Van Jacobson", "SCALA",
"TCP Rate halving", "Receiver Livelock", "RICE University",
"Duke University", "University of Utah".  That will at least get
you most of the papers.  Then follow the references to the other
papers.


> > You will get much better load capacity scaling out of two cheaper
> > boxes, if you implement correctly, IMO.
> 
> Synchronization of the unformatted data can probably never get as good as
> it gets if you optimize the system for your case. But I agree it should be
> better than it is now, however it does not really seem to get any better.
> (unless you consider the EV7 and Opteron approaches better than the current
> Intel approach)

It's a lot of work to do it right.  SVR4.2 and up doesn't do it
right, despite their indirect claims to scale to 32 CPUs in the
SCO vs. IBM suit recently filed.  The secret recipe, if there is
one, is probably "lock avoidance through algorithm choice", rather
than "better locking" or "finer grained locking", etc..  Even then,
you are usually talking a scaling factor of almost 10 times on any
stall external to the CPU chip itself, because of bus speeds, and
that's on the best hardware.  For a 3GHz CPU with a 133MHz front
side bus, that's more like 23 times.  If it's I/O bus, then you
are talking 46 times.  It's really, really ugly once you get out
of the L1 cache...

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message