From owner-cvs-all@FreeBSD.ORG Tue Nov 9 19:34:31 2004 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 43BB216A4CE for ; Tue, 9 Nov 2004 19:34:31 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 8FEC043D5D for ; Tue, 9 Nov 2004 19:34:30 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 28604 invoked by uid 89); 9 Nov 2004 19:34:25 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 9 Nov 2004 19:34:25 -0000 Received: (qmail 28555 invoked by uid 89); 9 Nov 2004 19:34:24 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 9 Nov 2004 19:34:24 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id iA9JYN5R030523; Tue, 9 Nov 2004 14:34:23 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Wemm In-Reply-To: <200411091057.54867.peter@wemm.org> References: <1100024464.29384.30.camel@palm.tree.com> <200411091057.54867.peter@wemm.org> Content-Type: text/plain Message-Id: <1100028863.29384.111.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 09 Nov 2004 14:34:23 -0500 Content-Transfer-Encoding: 7bit cc: src-committers@freebsd.org cc: John Baldwin cc: Alan Cox cc: cvs-src@freebsd.org cc: Mike Silbersack cc: cvs-all@freebsd.org cc: Robert Watson cc: Julian Elischer Subject: Re: cvs commit: src/sys/i386/i386 pmap.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Nov 2004 19:34:31 -0000 On Tue, 2004-11-09 at 13:57, Peter Wemm wrote: > On Tuesday 09 November 2004 10:21 am, Stephan Uphoff wrote: > > On Tue, 2004-11-09 at 13:02, Julian Elischer wrote: > > > Robert Watson wrote: > > > >This change made a large difference, and eliminates the > > > > unexplained costs. Here's a revised table as compared to the > > > > above: > > > > > > > > sleep mutex crit section spin mutex new spin mutex > > > > UP SMP UP SMP UP SMP UP SMP > > > >PIII 21 81 83 81 112 141 95 141 > > > >P4 39 260 120 119 274 342 132 231 > > > > > > > >So it basically cut 140 cycles off the P4 UP spin lock, 15 off the > > > > PIII UP spin lock, and 110 cycles off the P4 SMP spin lock. The > > > > PIII SMP spin lock looks the same. Keep in mind that all of > > > > these measurements have a standard deviation of between 0 and 3 > > > > cycles, most in the 1 range. Also keep in mind that these are > > > > entirely uncontended measurements. > > > > > > > >Assuming that these changes are correct, and pass whatever tests > > > > people have in mind, this would be a very strong merge candidate > > > > for performance reasons. The difference is visible in packet > > > > send tests from user space as a percentage or two improvement on > > > > UP on my P4, although it's a litte hard to tell due to the noise. > > > > > > Can you explain why a spin mutex is more expensive than a sleep > > > mutex (I assume this is uncontested)? > > > > cli() and sti() used for the critical section are expensive. > > ... on INTEL cpus! Don't make the mistake of assuming that all x86 cpus > are as slow as Intel's P4 family on this stuff. Other cpus don't have > the same massive microcode penalty. My recollection is that athlon > (and athlon64 cpus in 32 bit mode) take about 8-12 clocks to do a cli > or sti, compared to 300+ for a P4 cpu. And things like 50-90 clocks > for an invlpg vs 1200-1600 clocks for a P4. > > Please don't accidently penalize those of us with cpus that were > designed for good all-round performance. The P4 family was designed > for games and 3d graphics, not all-round performance. > > (This isn't aimed at anybody in particular.. I just wanted to remind > people that the P4 code is a particularly pathological case (and the > writing is on the wall for that core). Other cpus, including intel's > newer non-P4 cores, dont have the same pathological problems.) Good points. This seems to lead to the same choices as in my last email. ( non optimal code, lots of compile options or self modifying code) Is there any reason not to implement self modifying code as for example used in linux for memory barriers? ( Andi Kleen, [PATCH] Runtime memory barrier patching - http://lkml.org/lkml/2003/4/21/168 ) Maybe this would even allow shipping SMP capable kernels by default again. Stephan