Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Oct 2014 15:33:06 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Andrew Turner <andrew@fubar.geek.nz>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>, Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
Subject:   Re: atomic ops
Message-ID:  <CAJ-FndD=9MgK608ra8%2BeMy=cAdq%2BA0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
In-Reply-To: <20141028142510.10a9d3cb@bender.lan>
References:  <20141028025222.GA19223@dft-labs.eu> <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com> <20141028142510.10a9d3cb@bender.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz> wrote:
> On Tue, 28 Oct 2014 14:18:41 +0100
> Attilio Rao <attilio@freebsd.org> wrote:
>
>> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
>> wrote:
>> > As was mentioned sometime ago, our situation related to atomic ops
>> > is not ideal.
>> >
>> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
>> > full memory barriers, which is stronger than needed.
>> >
>> > Moreover, load is implemented as lock cmpchg on var address, so it
>> > is addditionally slower especially when cpus compete.
>>
>> I already explained this once privately: fully memory barriers is not
>> stronger than needed.
>> FreeBSD has a different semantic than Linux. We historically enforce a
>> full barrier on _acq() and _rel() rather then just a read and write
>> barrier, hence we need a different implementation than Linux.
>> There is code that relies on this property, like the locking
>> primitives (release a mutex, for instance).
>
> On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added
> support for load-acquire and store-release atomic instructions. For the
> use in atomic instructions we can assume these only operate of the
> address passed to them.
>
> It is unlikely we will use them in the 32-bit port however I would like
> to know the expected semantics of these atomic functions to make sure
> we get them correct in the arm64 port. I have been advised by one of
> the ARM Linux kernel maintainers on the problems they have found using
> these instructions but have yet to determine what our atomic functions
> guarantee.

For FreeBSD the "reference doc" is atomic(9).
It clearly states:

The second variant of each operation includes a read memory barrier.
This barrier ensures that the effects of this operation are completed
before the effects of any later data accesses.  As a result, the opera-
tion is said to have acquire semantics as it acquires a pseudo-lock
requiring further operations to wait until it has completed.  To denote
this, the suffix ``_acq'' is inserted into the function name immediately
prior to the ``_<type>'' suffix.  For example, to subtract two integers
ensuring that any later writes will happen after the subtraction is per-
formed, use atomic_subtract_acq_int().

The third variant of each operation includes a write memory barrier.
This ensures that all effects of all previous data accesses are completed
before this operation takes place. As a result, the operation is said to
have release semantics as it releases any pending data accesses to be
completed before its operation is performed.  To denote this, the suffix
``_rel'' is inserted into the function name immediately prior to the
``_<type>'' suffix.  For example, to add two long integers ensuring that
all previous writes will happen first, use atomic_add_rel_long().

The bottom-side of all this is that read memory barriers ensures that
the effect of the operations you are making (load in case of
atomic_load_acq_int(), for example) are completed before any later
data accesses. "Data accesses" qualifies for *all* the operations
including read, writes, etc. This is very different by what Linux
assumes for its rmb() barrier, for example which just orders loads. So
for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
wmb() analogy.

This must be kept well in mind when trying to optimize the atomic_*()
operations.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndD=9MgK608ra8%2BeMy=cAdq%2BA0xRp9u3xFrwtPEk8eH4CA>