From owner-freebsd-current@FreeBSD.ORG Thu May 6 06:52:30 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7709C16A4CF for ; Thu, 6 May 2004 06:52:30 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id C1DBA43D41 for ; Thu, 6 May 2004 06:52:29 -0700 (PDT) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2657.72) id ; Thu, 6 May 2004 09:52:28 -0400 Message-ID: From: Don Bowman To: 'Bruce Evans' , Andrew Gallatin Date: Thu, 6 May 2004 09:52:27 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2657.72) Content-Type: text/plain; charset="iso-8859-1" cc: freebsd-current@FreeBSD.org cc: Gerrit Nagelhout Subject: RE: 4.7 vs 5.2.1 SMP/UP bridging performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2004 13:52:30 -0000 From: Bruce Evans [mailto:bde@zeta.org.au] > On Wed, 5 May 2004, Andrew Gallatin wrote: > ... > > > > Actually, I think his tests are accurate and bus locked instructions > > take an eternity on P4. See > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html > > > > For example, with your test above, I see 212 cycles for the > UP case on > > a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a > > simple slock = 0; reduces that count to 18 cycles. > > This seems to be right, unfortunately. I wonder if this has > anything to > do with freebsd.org having no P4 machines. > > > If its really safe to remove the xchg* from non-SMP > atomic_store_rel*, > > then I think you should do it. Of course, that still leaves mutexes > > as very expensive on SMP (253 cycles on the 2.53GHz from above). > > I forgot (again) that there are memory access ordering issues. A lock > may be needed to get everything synced. See the comment > before the i386 > versions in i386/include/atomic.h. A single lock may be enough. The > best example I could think of easily is: On the P4, there are mfence,lfence,sfence instructions to enforce memory ordering. These are cheaper than "lock; andl" or "cpuid", which are the traditional 'sync' instructions.