Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 Jan 2014 03:29:10 +0400
From:      Oleg Bulyzhin <oleg@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-hackers@freebsd.org, Oleg Bulyzhin <oleg@freebsd.org>
Subject:   Re: atomic_load_acq @ i386/amd64
Message-ID:  <20140104232910.GA12331@lath.rinet.ru>
In-Reply-To: <20140104172923.GY59496@kib.kiev.ua>
References:  <20140103205159.GA99722@lath.rinet.ru> <20140104172923.GY59496@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jan 04, 2014 at 07:29:23PM +0200, Konstantin Belousov wrote:
> On Sat, Jan 04, 2014 at 12:51:59AM +0400, Oleg Bulyzhin wrote:
> > 
> > Hello.
> > 
> > I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs
> > with locked cmpxchg instruction? Comment about this
> > (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe
> > acquire/release semantics does not require StoreLoad barrier so simple aligned
> > load should be enough. (because acquire/release semantics does not guarantee
> > sequential consistency).
> 
> You did not explicitely wrote which statement in the comment is false, in
> your opinion.

> 
> FreeBSD assumes a property of _acq/_rel stuff which is sometimes called
> 'total lock ordering'. It is indeed sort of sequential consistency, but
> only for atomic+membar ops. Would atomic_load_acq()  implemented as plain
> load, it can pass stores, in particular stores from the _rel op, which
> breaks the guarantee.
> 
> For x86, there are indeed two possible schemes for implementing critical
> section, one is lock cmpxchg for get(), and plain store for release(),
> which is what we use. Another is plain load for get(), and xchg for
> release().  Then, the load_acq() must be adopted to not break the acq/rel
> consistency, and since we use plain store for release(), load_acq must
> use serialing instruction.

Perhaps i was not clear enough, i'm talking about this one:
"However, loads may pass stores, so for atomic_load_acq we have to
 ensure a Store/Load barrier to do the load in SMP kernels."

As far as i know acquire/release semantics guarantees following:
if we have this code
<prev_code>
_acq
<some code>
_rel
<post_code>

following statements are true:
1) <some code> cannot leave (due to reordering) acq/rel block
2) <prev_code> may leak past _acq 
3) <post_code> may leak before _rel
So neither _acq nor _rel requires full membar. I.e.
op_acq is:
<op>
<one way membar, down->up reordering is prohibited>
op_rel is:
<one way membar, up->down reordering is prohibited>
<op>

Intel documentation says about only thing (for simple load/stores) can be
reordered: "Reads may be reordered with older writes to different locations
but not with older writes to the same location."

So, if older store can pass our load_acq() it would not break requirements.
And i do not understand how load op from load_acq() can pass store op from
store_rel(), intel doc says: "Writes are not reordered with older reads". 

Well, while writing this email i realized what is disturbing me: it's atomic(9)
"Multiple Processors" section. It claims atomics are not atomic in common MP
case and says atomics are atomic @i386. It looks strange for me:
1) i guess it's not "atomic" even for i386/MP without proper membar pairing.
2) if we have acq/rel modifiers for atomics why we cannot guarantee "atomicity"
   for any MP arch?

P.S. please correct me if i'm wrong in my statements, i'm spending my new year
holidays for ignorance elimination. ;)

-- 
Oleg.

================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru ===
================================================================




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140104232910.GA12331>