Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Oct 2014 17:53:18 +0000
From:      Andrew Turner <andrew@fubar.geek.nz>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>, Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
Subject:   Re: atomic ops
Message-ID:  <20141028175318.709d2ef6@bender.lan>
In-Reply-To: <CAJ-FndD=9MgK608ra8%2BeMy=cAdq%2BA0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
References:  <20141028025222.GA19223@dft-labs.eu> <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com> <20141028142510.10a9d3cb@bender.lan> <CAJ-FndD=9MgK608ra8%2BeMy=cAdq%2BA0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 28 Oct 2014 15:33:06 +0100
Attilio Rao <attilio@freebsd.org> wrote:
> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
> wrote:
> > On Tue, 28 Oct 2014 14:18:41 +0100
> > Attilio Rao <attilio@freebsd.org> wrote:
> >
> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
> >> wrote:
> >> > As was mentioned sometime ago, our situation related to atomic
> >> > ops is not ideal.
> >> >
> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
> >> > provide full memory barriers, which is stronger than needed.
> >> >
> >> > Moreover, load is implemented as lock cmpchg on var address, so
> >> > it is addditionally slower especially when cpus compete.
> >>
> >> I already explained this once privately: fully memory barriers is
> >> not stronger than needed.
> >> FreeBSD has a different semantic than Linux. We historically
> >> enforce a full barrier on _acq() and _rel() rather then just a
> >> read and write barrier, hence we need a different implementation
> >> than Linux. There is code that relies on this property, like the
> >> locking primitives (release a mutex, for instance).
> >
> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
> > added support for load-acquire and store-release atomic
> > instructions. For the use in atomic instructions we can assume
> > these only operate of the address passed to them.
> >
> > It is unlikely we will use them in the 32-bit port however I would
> > like to know the expected semantics of these atomic functions to
> > make sure we get them correct in the arm64 port. I have been
> > advised by one of the ARM Linux kernel maintainers on the problems
> > they have found using these instructions but have yet to determine
> > what our atomic functions guarantee.
> 
> For FreeBSD the "reference doc" is atomic(9).
> It clearly states:

There may also be a difference between what it states, how they are
implemented, and what developers assume they do. I'm trying to make
sure I get them correct.

> The second variant of each operation includes a read memory barrier.
> This barrier ensures that the effects of this operation are completed
> before the effects of any later data accesses.  As a result, the
> opera- tion is said to have acquire semantics as it acquires a
> pseudo-lock requiring further operations to wait until it has
> completed.  To denote this, the suffix ``_acq'' is inserted into the
> function name immediately prior to the ``_<type>'' suffix.  For
> example, to subtract two integers ensuring that any later writes will
> happen after the subtraction is per- formed, use
> atomic_subtract_acq_int().

It depends on the point we guarantee the acquire barrier to be. On ARMv8
the function will be a load/modify/write sequence. If we use a
load-acquire operation for atomic_subtract_acq_int, for example, for a
pointer P and value to subtract X:

loop:
 load-acquire *P to N
 perform N = N - X
 store-exclusive N to *P
 if the store failed goto loop

where N and X are both registers.

This will mean no access after this loop will happen before it, but
they may happen within it, e.g. if there was a later access A the
following may be possible:

Load P
Access A
Store P

We know the store will happen as if it fails, e.g. another processor
access *P, the store will have failed and will iterate over the loop.

The other point is we can guarantee any store-release, and therefore
any prior access, has happened before a later load-acquire even if it's
on another processor.

...

> The bottom-side of all this is that read memory barriers ensures that
> the effect of the operations you are making (load in case of
> atomic_load_acq_int(), for example) are completed before any later
> data accesses. "Data accesses" qualifies for *all* the operations
> including read, writes, etc. This is very different by what Linux
> assumes for its rmb() barrier, for example which just orders loads. So
> for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
> wmb() analogy.

On ARMv8 using the above pseudo-code the operation later operations
will not be moved before the load-acquire, but they may happen before
it's store. Having discussed this with John Baldwin I don't think this
is a problem due to the nature of the store operation being allowed to
fail if another processor has written its memory.

> 
> This must be kept well in mind when trying to optimize the atomic_*()
> operations.

At this point I'm more interested in getting them correct as they will
be important when I start on SMP support.

Andrew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141028175318.709d2ef6>