Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Aug 2014 11:26:46 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        Johan Schuijt <johan@transip.nl>, freebsd-arch@freebsd.org
Subject:   Re: [PATCH 1/2] Implement simple sequence counters with memory barriers.
Message-ID:  <20140818082646.GL2737@kib.kiev.ua>
In-Reply-To: <20140817012646.GA21025@dft-labs.eu>
References:  <1408064112-573-1-git-send-email-mjguzik@gmail.com> <1408064112-573-2-git-send-email-mjguzik@gmail.com> <20140816093811.GX2737@kib.kiev.ua> <20140816185406.GD2737@kib.kiev.ua> <20140817012646.GA21025@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help

--HlXFiQcSFG/a+HqU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Aug 17, 2014 at 03:26:47AM +0200, Mateusz Guzik wrote:
> On Sat, Aug 16, 2014 at 09:54:06PM +0300, Konstantin Belousov wrote:
> > On Sat, Aug 16, 2014 at 12:38:11PM +0300, Konstantin Belousov wrote:
> > > On Fri, Aug 15, 2014 at 02:55:11AM +0200, Mateusz Guzik wrote:
> > > > ---
> > > >  sys/sys/seq.h | 126 ++++++++++++++++++++++++++++++++++++++++++++++=
++++++++++++
> > > >  1 file changed, 126 insertions(+)
> > > >  create mode 100644 sys/sys/seq.h
> > > >=20
> > > > diff --git a/sys/sys/seq.h b/sys/sys/seq.h
> > > > new file mode 100644
> > > > index 0000000..0971aef
> > > > --- /dev/null
> > > > +++ b/sys/sys/seq.h
> [..]
> > > > +#ifndef _SYS_SEQ_H_
> > > > +#define _SYS_SEQ_H_
> > > > +
> > > > +#ifdef _KERNEL
> > > > +
> > > > +/*
> > > > + * Typical usage:
> > > > + *
> > > > + * writers:
> > > > + * 	lock_exclusive(&obj->lock);
> > > > + * 	seq_write_begin(&obj->seq);
> > > > + * 	.....
> > > > + * 	seq_write_end(&obj->seq);
> > > > + * 	unlock_exclusive(&obj->unlock);
> > > > + *
> > > > + * readers:
> > > > + * 	obj_t lobj;
> > > > + * 	seq_t seq;
> > > > + *
> > > > + * 	for (;;) {
> > > > + * 		seq =3D seq_read(&gobj->seq);
> > > > + * 		lobj =3D gobj;
> > > > + * 		if (seq_consistent(&gobj->seq, seq))
> > > > + * 			break;
> > > > + * 		cpu_spinwait();
> > > > + * 	}
> > > > + * 	foo(lobj);
> > > > + */	=09
> > > > +
> > > > +typedef uint32_t seq_t;
> > > > +
> > > > +/* A hack to get MPASS macro */
> > > > +#include <sys/systm.h>
> > > > +#include <sys/lock.h>
> > > > +
> > > > +#include <machine/cpu.h>
> > > > +
> > > > +static __inline bool
> > > > +seq_in_modify(seq_t seqp)
> > > > +{
> > > > +
> > > > +	return (seqp & 1);
> > > > +}
> > > > +
> > > > +static __inline void
> > > > +seq_write_begin(seq_t *seqp)
> > > > +{
> > > > +
> > > > +	MPASS(!seq_in_modify(*seqp));
> > > > +	(*seqp)++;
> > > > +	wmb();
> > > This probably ought to be written as atomic_add_rel_int(seqp, 1);
> > Alan Cox rightfully pointed out that better expression is
> > v =3D *seqp + 1;                                                       =
          =20
> > atomic_store_rel_int(seqp, v);
> > which also takes care of TSO on x86.
> >=20
>=20
> Well, my memory-barrier-and-so-on-fu is rather weak.
>=20
> I had another look at the issue. At least on amd64, it looks like only
> compiler barrier is required for both reads and writes.
>=20
> According to AMD64 Architecture Programmer???s Manual Volume 2: System
> Programming, 7.2 Multiprocessor Memory Access Ordering states:
>=20
> "Loads do not pass previous loads (loads are not reordered). Stores do
> not pass previous stores (stores are not reordered)"
>=20
> Since the code modifying stuff only performs a series of writes and we
> expect exclusive writers, I find it applicable to this scenario.
I agree.

>=20
> I checked linux sources and generated assembly, they indeed issue only
> a compiler barrier on amd64 (and for intel processors as well).
>=20
> atomic_store_rel_int on amd64 seems fine in this regard, but the only
> function for loads issues lock cmpxhchg which kills performance
> (median 55693659 -> 12789232 ops in a microbenchmark) for no gain.
>=20
> Additionally release and acquire semantics seems to be a stronger than
> needed guarantee.
>=20
> As far as sequence counters go, we should be able to get away with
> making the following:
> - all relevant reads are performed between given points
> - all relevant writes are performed between given points
>=20
> As such, I propose introducing another atomic_* function variants
> (or stealing smp_{w,r,}mb idea from linux) which provide just that.
>=20
> So for amd64 reading guarantee and writing guarantee could be provided
> in the same way with a compiler barrier.
I think even this could be nicely done in the ia64 style of acq/rel.

>=20
> > > Same note for all other linux-style barriers.  In fact, on x86
> > > wmb() is sfence and it serves no useful purpose in seq_write*.
> > >=20
> > > Overall, it feels too alien and linux-ish for my taste.
> > > Since we have sequence bound to some lock anyway, could we introduce
> > > some sort of generation-aware locks variants, which extend existing
> > > locks, and where lock/unlock bump generation number ?
> > Still, merging it to the guts of lock implementation is right
> > approach, IMO.
> >=20
>=20
> Current usage would be along with filedesc (sx) lock. The lock protects
> writes to entire fd table (and lock holders can block in malloc), while
> each file descriptor has its own counter. Also areas covered by seq are
> short and cannot block.
>=20
> As such, I don't really see any way to merge the lock with the counter.
Ok, I recall my proposal.

>=20
> I agree it would be useful, provided area protected by the lock would be
> the same as the one protected by the counter. If this code hits the tree
> and one day turns out someone needs such functionality, there should not
> be any problems (apart from time effort) in implementing this.
>=20
> --=20
> Mateusz Guzik <mjguzik gmail.com>

--HlXFiQcSFG/a+HqU
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJT8bjGAAoJEJDCuSvBvK1BuUMP/0rf9cbKqSq8iHnGKIS2ORmZ
Kmt2SMZSEqIEqR/RaVIwvvsCldgV7j2IYHIf74OFaQ/stPWSEJd8ftsDVhylCEIE
XlMrW9W3BjsG224MMpsWXX30dm/iCfPBvKMl9ujJgEY7zpPCUgCIzu9QppJLJhxK
Tk+zLu6fqT8ups7lsQkJLGS1ZhrWTGAQLmvFlGUsTI5lq0yQjKXzgeYLadP29ntx
7q2QbIX1AN7oV/KvM4GpjSmDuUnvpU5OntCcGFtvycX791A8KIhjBIKsZxqE3Snp
Uw6ACdbOfT3i93AkFbM0kx8tSyzyozL6LTUaxPRG9A/H/7NNlivyUh+Ci5QFZtC3
i/BkRY9ty8cisq95EbJm23DtRNWxKq7GsXD/jOudv4BLIZA5T3HXEVNrjeEkFtZn
6EuD8PWh8WHkpxBqIgKy6ZxmxtDGc94ux+ECno5KOiV55hko9nLisgwdsPuAJA9U
WVU989GSpkBrxtmrorDbz7LFmyUVWQ2aY2LBTT3Noy+fukNxLDgsXrjCqrh5YEZW
AjrrmS865vIov0OE+3B7Y2qe140838dLbC00+sUAI7GHBd4/1DZL29BirMJoYAYk
V1a/lNPxhtf8iQcQIGiLI4vbYd1OjXjESiCRAkezUArY/5kRD+3ORQiyto2Uilih
fuEIwgfMENZVp8yi7rDt
=9fvJ
-----END PGP SIGNATURE-----

--HlXFiQcSFG/a+HqU--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140818082646.GL2737>