Date: Fri, 27 Apr 2012 00:43:48 +0200 From: Jilles Tjoelker <jilles@stack.nl> To: Ricardo Nabinger Sanchez <rnsanchez@wait4.org> Cc: freebsd-threads@freebsd.org Subject: Re: About the memory barrier in BSD libc Message-ID: <20120426224348.GA58463@stack.nl> In-Reply-To: <jn9v4s$hij$1@dough.gmane.org> References: <CAPHpMu=DOGQ=TuFeYH7bH8hVwteT4Q3k67-mvoOFob6P3Y506w@mail.gmail.com> <20120423084120.GD76983@zxy.spb.ru> <jn9v4s$hij$1@dough.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 25, 2012 at 10:51:08PM +0000, Ricardo Nabinger Sanchez wrote: > On Mon, 23 Apr 2012 12:41:20 +0400, Slawa Olhovchenkov wrote: > > /usr/include/machine/atomic.h: > > #define mb() __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > #define wmb() __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > #define rmb() __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > Somewhat late on this topic, but I'd like to understand why issue a write > on %esp, which would invalidate (%esp) on other cores --- thus forcing a > miss on them? The stack is usually private to a thread and is written a lot regardless. > Instead, why not issue "mfence" (mb), "sfence" (wmb), and "lfence" (rmb)? Apart from the fact that those are SSE or SSE2 instructions, the uncontended locked instructions may be faster. For example, MFENCE is a serializing instruction on AMD family 10h processors and the AMD optimization manual recommends using an uncontended locked instruction instead (though preferably one that performs a useful store). SFENCE is only useful with non-temporal stores such as MOVNTPS or stores to write-combining memory because regular stores are not reordered with respect to one another. -- Jilles Tjoelker
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120426224348.GA58463>