From owner-freebsd-arch@FreeBSD.ORG Sun Aug 17 01:26:58 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DEE43BE6; Sun, 17 Aug 2014 01:26:57 +0000 (UTC) Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com [IPv6:2a00:1450:400c:c00::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4917A23D8; Sun, 17 Aug 2014 01:26:57 +0000 (UTC) Received: by mail-wg0-f43.google.com with SMTP id l18so3612006wgh.14 for ; Sat, 16 Aug 2014 18:26:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=32AQpC+aVQbFOxyzwAsARKVHWmDo+UsYIP8IUKUFVX8=; b=bdhCwnryhRY/P5L1y/SFV53kx5K7U2XHkeEL/vtHaxFrFXMYSH6qFtx2EMYt6TKM+Z cv3+76otQKRB3BHB+kgdowzjVgGdTzCWtPOwz1bVWeP7N4iN3jbbl5lrkb/j+9D8jIb+ QHUcQy9DsNkpxhhnS5snoSqEqysgOh4vUYdfwqPU0lwXo9EvhzrTlpUfBH/JFCpcYDM3 svuCrBWHnx18jT7BSU4jmFK936V4Rj32QKLjwAQ6L3SmlZbPDq/ds2cHffqvipbnSrYE 6EdvQ0vF+7ayYecSrb5RraZafYa/oRjrfWcp9K+Miurwsew9ejE2ypeZvL+bTZu0WDU2 tXGw== X-Received: by 10.180.89.100 with SMTP id bn4mr18059090wib.34.1408238815528; Sat, 16 Aug 2014 18:26:55 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id w1sm22141460wiz.14.2014.08.16.18.26.54 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 16 Aug 2014 18:26:54 -0700 (PDT) Date: Sun, 17 Aug 2014 03:26:47 +0200 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: [PATCH 1/2] Implement simple sequence counters with memory barriers. Message-ID: <20140817012646.GA21025@dft-labs.eu> References: <1408064112-573-1-git-send-email-mjguzik@gmail.com> <1408064112-573-2-git-send-email-mjguzik@gmail.com> <20140816093811.GX2737@kib.kiev.ua> <20140816185406.GD2737@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20140816185406.GD2737@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Johan Schuijt , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Aug 2014 01:26:58 -0000 On Sat, Aug 16, 2014 at 09:54:06PM +0300, Konstantin Belousov wrote: > On Sat, Aug 16, 2014 at 12:38:11PM +0300, Konstantin Belousov wrote: > > On Fri, Aug 15, 2014 at 02:55:11AM +0200, Mateusz Guzik wrote: > > > --- > > > sys/sys/seq.h | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 126 insertions(+) > > > create mode 100644 sys/sys/seq.h > > > > > > diff --git a/sys/sys/seq.h b/sys/sys/seq.h > > > new file mode 100644 > > > index 0000000..0971aef > > > --- /dev/null > > > +++ b/sys/sys/seq.h [..] > > > +#ifndef _SYS_SEQ_H_ > > > +#define _SYS_SEQ_H_ > > > + > > > +#ifdef _KERNEL > > > + > > > +/* > > > + * Typical usage: > > > + * > > > + * writers: > > > + * lock_exclusive(&obj->lock); > > > + * seq_write_begin(&obj->seq); > > > + * ..... > > > + * seq_write_end(&obj->seq); > > > + * unlock_exclusive(&obj->unlock); > > > + * > > > + * readers: > > > + * obj_t lobj; > > > + * seq_t seq; > > > + * > > > + * for (;;) { > > > + * seq = seq_read(&gobj->seq); > > > + * lobj = gobj; > > > + * if (seq_consistent(&gobj->seq, seq)) > > > + * break; > > > + * cpu_spinwait(); > > > + * } > > > + * foo(lobj); > > > + */ > > > + > > > +typedef uint32_t seq_t; > > > + > > > +/* A hack to get MPASS macro */ > > > +#include > > > +#include > > > + > > > +#include > > > + > > > +static __inline bool > > > +seq_in_modify(seq_t seqp) > > > +{ > > > + > > > + return (seqp & 1); > > > +} > > > + > > > +static __inline void > > > +seq_write_begin(seq_t *seqp) > > > +{ > > > + > > > + MPASS(!seq_in_modify(*seqp)); > > > + (*seqp)++; > > > + wmb(); > > This probably ought to be written as atomic_add_rel_int(seqp, 1); > Alan Cox rightfully pointed out that better expression is > v = *seqp + 1; > atomic_store_rel_int(seqp, v); > which also takes care of TSO on x86. > Well, my memory-barrier-and-so-on-fu is rather weak. I had another look at the issue. At least on amd64, it looks like only compiler barrier is required for both reads and writes. According to AMD64 Architecture Programmer’s Manual Volume 2: System Programming, 7.2 Multiprocessor Memory Access Ordering states: "Loads do not pass previous loads (loads are not reordered). Stores do not pass previous stores (stores are not reordered)" Since the code modifying stuff only performs a series of writes and we expect exclusive writers, I find it applicable to this scenario. I checked linux sources and generated assembly, they indeed issue only a compiler barrier on amd64 (and for intel processors as well). atomic_store_rel_int on amd64 seems fine in this regard, but the only function for loads issues lock cmpxhchg which kills performance (median 55693659 -> 12789232 ops in a microbenchmark) for no gain. Additionally release and acquire semantics seems to be a stronger than needed guarantee. As far as sequence counters go, we should be able to get away with making the following: - all relevant reads are performed between given points - all relevant writes are performed between given points As such, I propose introducing another atomic_* function variants (or stealing smp_{w,r,}mb idea from linux) which provide just that. So for amd64 reading guarantee and writing guarantee could be provided in the same way with a compiler barrier. > > Same note for all other linux-style barriers. In fact, on x86 > > wmb() is sfence and it serves no useful purpose in seq_write*. > > > > Overall, it feels too alien and linux-ish for my taste. > > Since we have sequence bound to some lock anyway, could we introduce > > some sort of generation-aware locks variants, which extend existing > > locks, and where lock/unlock bump generation number ? > Still, merging it to the guts of lock implementation is right > approach, IMO. > Current usage would be along with filedesc (sx) lock. The lock protects writes to entire fd table (and lock holders can block in malloc), while each file descriptor has its own counter. Also areas covered by seq are short and cannot block. As such, I don't really see any way to merge the lock with the counter. I agree it would be useful, provided area protected by the lock would be the same as the one protected by the counter. If this code hits the tree and one day turns out someone needs such functionality, there should not be any problems (apart from time effort) in implementing this. -- Mateusz Guzik