From owner-freebsd-current@FreeBSD.ORG Wed May 5 14:23:44 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 62D8916A4CE for ; Wed, 5 May 2004 14:23:44 -0700 (PDT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id C82C943D48 for ; Wed, 5 May 2004 14:23:41 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i45LNfxZ019919 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 5 May 2004 17:23:41 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i45LNUft001054; Wed, 5 May 2004 17:23:30 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16537.23378.375946.857908@grasshopper.cs.duke.edu> Date: Wed, 5 May 2004 17:23:30 -0400 (EDT) To: Bruce Evans In-Reply-To: <20040505222636.H15444@gamplex.bde.org> References: <20040505222636.H15444@gamplex.bde.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: freebsd-current@freebsd.org cc: Gerrit Nagelhout Subject: RE: 4.7 vs 5.2.1 SMP/UP bridging performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 May 2004 21:23:44 -0000 Bruce Evans writes: > > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: 37 cycles > Celeron 366 SMP system: 35 48 > > The extra cycles for the SMP case are just the extra cost of a one lock > instruction. Note that SMP should cost twice as much extra, but the > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl > which always locks the bus. After fixing this: > > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: 37 cycles > Celeron 366 SMP system: 10 48 > > Mutexes take longer than simple locks, but not much longer unless the > lock is contested. In particular, they don't lock the bus any more > and the extra cycles for locking dominate (even in the !SMP case due > to the pessimization). > > So there seems to be something wrong with your benchmark. Locking the > bus for the SMP case always costs about 20+ cycles, but this hasn't > changed since RELENG_4 and mutexes can't be made much faster in the > uncontested case since their overhead is dominated by the bus lock > time. > Actually, I think his tests are accurate and bus locked instructions take an eternity on P4. See http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html For example, with your test above, I see 212 cycles for the UP case on a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a simple slock = 0; reduces that count to 18 cycles. If its really safe to remove the xchg* from non-SMP atomic_store_rel*, then I think you should do it. Of course, that still leaves mutexes as very expensive on SMP (253 cycles on the 2.53GHz from above). Drew