Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Oct 2014 07:28:51 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        alc@freebsd.org
Cc:        Konstantin Belousov <kostikbel@gmail.com>, attilio@freebsd.org, Johan Schuijt <johan@transip.nl>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: [PATCH 1/2] Implement simple sequence counters with memory barriers.
Message-ID:  <20141004052851.GA27891@dft-labs.eu>
In-Reply-To: <CAJUyCcPA7ZDNbwyfx3fT7mq3SE7M-mL5he=eXZ8bY3z-xUCJ-g@mail.gmail.com>
References:  <1408064112-573-1-git-send-email-mjguzik@gmail.com> <1408064112-573-2-git-send-email-mjguzik@gmail.com> <20140816093811.GX2737@kib.kiev.ua> <20140816185406.GD2737@kib.kiev.ua> <20140817012646.GA21025@dft-labs.eu> <CAJUyCcPA7ZDNbwyfx3fT7mq3SE7M-mL5he=eXZ8bY3z-xUCJ-g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Reviving. Sorry everyone for such big delay, $life.

On Tue, Aug 19, 2014 at 02:24:16PM -0500, Alan Cox wrote:
> On Sat, Aug 16, 2014 at 8:26 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > Well, my memory-barrier-and-so-on-fu is rather weak.
> >
> > I had another look at the issue. At least on amd64, it looks like only
> > compiler barrier is required for both reads and writes.
> >
> > According to AMD64 Architecture Programmer’s Manual Volume 2: System
> > Programming, 7.2 Multiprocessor Memory Access Ordering states:
> >
> > "Loads do not pass previous loads (loads are not reordered). Stores do
> > not pass previous stores (stores are not reordered)"
> >
> > Since the code modifying stuff only performs a series of writes and we
> > expect exclusive writers, I find it applicable to this scenario.
> >
> > I checked linux sources and generated assembly, they indeed issue only
> > a compiler barrier on amd64 (and for intel processors as well).
> >
> > atomic_store_rel_int on amd64 seems fine in this regard, but the only
> > function for loads issues lock cmpxhchg which kills performance
> > (median 55693659 -> 12789232 ops in a microbenchmark) for no gain.
> >
> > Additionally release and acquire semantics seems to be a stronger than
> > needed guarantee.
> >
> >
> 
> This statement left me puzzled and got me to look at our x86 atomic.h for
> the first time in years.  It appears that our implementation of
> atomic_load_acq_int() on x86 is, umm ..., unconventional.  That is, it is
> enforcing a constraint that simple acquire loads don't normally enforce.
> For example, the C11 stdatomic.h simple acquire load doesn't enforce this
> constraint.  Moreover, our own implementation of atomic_load_acq_int() on
> ia64, where the mapping from atomic_load_acq_int() to machine instructions
> is straightforward, doesn't enforce this constraint either.
> 

By 'this constraint' I presume you mean full memory barrier.

It is unclear to me if one can just get rid of it currently. It
definitely would be beneficial.

In the meantime, if for some reason full barrier is still needed, we can
speed up concurrent load_acq of the same var considerably. There is no
need to lock cmpxchg on the same address. We should be able to replace
it with +/-:
lock add $0,(%rsp);
movl ...;

I believe it is possible that cpu will perform some writes before doing
read listed here, but this should be fine.

If this is considered too risky to hit 10.1, I would like to implement
it within seq as a temporary hack to be fixed up later.

something along:
static inline int
atomic_load_acq_rmb(volatile u_int *p)
{
	volaitle u_int *v;

	v = *p;
	atomic_load_acq(&v);
	return (v);
}

This hack fixes aforementioned performance degradation and covers all
architectures.

> Give us a chance to sort this out before you do anything further.  As
> Kostik said, but in different words, we've always written our
> machine-independent layer code using acquires and releases to express the
> required ordering constraints and not {r,w}mb() primitives.
> 

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141004052851.GA27891>