From owner-freebsd-current Mon Mar 17 0: 8:52 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A98137B401 for ; Mon, 17 Mar 2003 00:08:49 -0800 (PST) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id AA68243FBF for ; Mon, 17 Mar 2003 00:08:42 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0070.cvx40-bradley.dialup.earthlink.net ([216.244.42.70] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18upff-0000W1-00; Mon, 17 Mar 2003 00:08:36 -0800 Message-ID: <3E75820D.C7EC28E1@mindspring.com> Date: Mon, 17 Mar 2003 00:06:37 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Petri Helenius Cc: freebsd-current@FreeBSD.ORG Subject: Re: mbuf cache References: <0ded01c2e295$cbef0940$932a40c1@PHE> <20030304164449.A10136@unixdaemons.com> <0e1b01c2e29c$d1fefdc0$932a40c1@PHE> <20030304173809.A10373@unixdaemons.com> <0e2b01c2e2a3$96fd3b40$932a40c1@PHE> <20030304182133.A10561@unixdaemons.com> <0e3701c2e2a7$aaa2b180$932a40c1@PHE> <20030304190851.A10853@unixdaemons.com> <001201c2e2ee$54eedfb0$932a40c1@PHE> <20030307093736.A18611@unixdaemons.com> <008101c2e4ba$53d875a0$932a40c1@PHE> <3E68ECBF.E7648DE8@mindspring.com> <3E70813B.7040504@he.iki.fi> <3E750D52.FFA28DA2@mindspring.com> <048601c2ec59$0696dd30$932a40c1@PHE> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4f1243bd5605230577196a444ab819537350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Petri Helenius wrote: > > This also has the desirable side effect that stack processing will > > occur on the same CPU as the interrupt processing occurred. This > > avoids inter-CPU memory bus arbitration cycles, and ensures that > > you won't engage in a lot of unnecessary L1 cache busting. Hence > > I prefer this method to polling. > > Anywhere I could read up on the associated overhead and how the whole > stuff works out in the worst case where data is DMAd into memory, > read up to CPU1 and then to CPU2 and then discarded and if there would be > any roads that can be taken to optimize this. Not really. If there were a good resource on this, people would have read it already, and some of the code that has been rewritten or replaced would never have been written the way it was in the first place. 8-). You can read technical papers on a lot of topics. Some contain information that has been "known" to the academic community since their publication, but has yet to make it into a commercial OS, let alone a free one like Linux or FreeBSD. Basically, it's an experience thing. John Lemon, who did the direct dispatch work, is a Cisco employee. He know what he knows because he has a lot of experience. Luigi Rizzo, who did the polling code, is a tenured University professor in Italy. He knows what he knows because he has a lot of experience. I did the soft interrupt coelescing code, and did a couple of the patches to add polling support to some of the ethernet drivers, etc.. I'm a voracious reader of research papers, and I've been a Novell, Artisoft, Whistle Communications, IBM, ClickArray, etc. employee. Bill Paul, who did most of the network drivers, did nothing but eat, breathe, and sleep network drivers for years. Etc. etc.. If you are asking for paper references, then I can at least tell you where to start; go to: http://citeseer.nj.nec.com/cs and look for "Jeff Mogul", "DEC Western Research Laboratories", "Mohit Aron", "Peter Druschel", "Sally Floyd", Van Jacobson", "SCALA", "TCP Rate halving", "Receiver Livelock", "RICE University", "Duke University", "University of Utah". That will at least get you most of the papers. Then follow the references to the other papers. > > You will get much better load capacity scaling out of two cheaper > > boxes, if you implement correctly, IMO. > > Synchronization of the unformatted data can probably never get as good as > it gets if you optimize the system for your case. But I agree it should be > better than it is now, however it does not really seem to get any better. > (unless you consider the EV7 and Opteron approaches better than the current > Intel approach) It's a lot of work to do it right. SVR4.2 and up doesn't do it right, despite their indirect claims to scale to 32 CPUs in the SCO vs. IBM suit recently filed. The secret recipe, if there is one, is probably "lock avoidance through algorithm choice", rather than "better locking" or "finer grained locking", etc.. Even then, you are usually talking a scaling factor of almost 10 times on any stall external to the CPU chip itself, because of bus speeds, and that's on the best hardware. For a 3GHz CPU with a 133MHz front side bus, that's more like 23 times. If it's I/O bus, then you are talking 46 times. It's really, really ugly once you get out of the L1 cache... -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message