From owner-freebsd-threads@FreeBSD.ORG Thu Apr 26 22:43:49 2012 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EACD3106566B for ; Thu, 26 Apr 2012 22:43:49 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 4A84F8FC15 for ; Thu, 26 Apr 2012 22:43:49 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 763BC3592E6; Fri, 27 Apr 2012 00:43:48 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 5953A2847A; Fri, 27 Apr 2012 00:43:48 +0200 (CEST) Date: Fri, 27 Apr 2012 00:43:48 +0200 From: Jilles Tjoelker To: Ricardo Nabinger Sanchez Message-ID: <20120426224348.GA58463@stack.nl> References: <20120423084120.GD76983@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-threads@freebsd.org Subject: Re: About the memory barrier in BSD libc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2012 22:43:50 -0000 On Wed, Apr 25, 2012 at 10:51:08PM +0000, Ricardo Nabinger Sanchez wrote: > On Mon, 23 Apr 2012 12:41:20 +0400, Slawa Olhovchenkov wrote: > > /usr/include/machine/atomic.h: > > #define mb() __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > #define wmb() __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > #define rmb() __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > Somewhat late on this topic, but I'd like to understand why issue a write > on %esp, which would invalidate (%esp) on other cores --- thus forcing a > miss on them? The stack is usually private to a thread and is written a lot regardless. > Instead, why not issue "mfence" (mb), "sfence" (wmb), and "lfence" (rmb)? Apart from the fact that those are SSE or SSE2 instructions, the uncontended locked instructions may be faster. For example, MFENCE is a serializing instruction on AMD family 10h processors and the AMD optimization manual recommends using an uncontended locked instruction instead (though preferably one that performs a useful store). SFENCE is only useful with non-temporal stores such as MOVNTPS or stores to write-combining memory because regular stores are not reordered with respect to one another. -- Jilles Tjoelker