Date: Wed, 24 Sep 2014 17:45:19 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: adrian@freebsd.org, kib@freebsd.org, freebsd-threads@freebsd.org Subject: Re: sem_post() performance Message-ID: <20140924144519.GL8870@kib.kiev.ua> In-Reply-To: <20140924143104.GK8870@kib.kiev.ua> References: <20140921213742.GA46868@stack.nl> <1531724.MPBlj40xOW@ralph.baldwin.cx> <20140923212000.GA78110@stack.nl> <2626215.ysfmRFbjz5@ralph.baldwin.cx> <20140924143104.GK8870@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 24, 2014 at 05:31:04PM +0300, Konstantin Belousov wrote: > On Wed, Sep 24, 2014 at 10:11:51AM -0400, John Baldwin wrote: > > On Tuesday, September 23, 2014 11:20:00 PM Jilles Tjoelker wrote: > > > On Mon, Sep 22, 2014 at 03:53:13PM -0400, John Baldwin wrote: > > > > On Sunday, September 21, 2014 11:37:42 PM Jilles Tjoelker wrote: > > > > > It has been reported that POSIX semaphores are slow, in contexts such as > > > > > Python. Note that POSIX semaphores are the only synchronization objects > > > > > that support use by different processes in shared memory; this does not > > > > > work for mutexes and condition variables because they are pointers to > > > > > the actual data structure. > > > > > > > > > > In fact, sem_post() unconditionally performs an umtx system call. > > > > > > > > *sigh* I was worried that that might be the case. > > > > > > > > > To avoid both lost wakeups and possible writes to a destroyed semaphore, > > > > > an uncontested sem_post() must check the _has_waiters flag atomically > > > > > with incrementing _count. > > > > > > > > > > The proper way to do this would be to take one bit from _count and > > > > > use it for the _has_waiters flag; the definition of SEM_VALUE_MAX > > > > > permits this. However, this would require a new set of umtx > > > > > semaphore operations and will break ABI of process-shared semaphores > > > > > (things may break if an old and a new libc access the same semaphore > > > > > over shared memory). > > > > > > > > > > This diff only affects 32-bit aligned but 64-bit misaligned > > > > > semaphores on 64-bit systems, and changes _count and _has_waiters > > > > > atomically using a 64-bit atomic operation. It probably needs a > > > > > may_alias attribute for correctness, but <sys/cdefs.h> does not have > > > > > a wrapper for that. > > > > > > > > It wasn't clear on first reading, but you are using aliasing to get > > > > around the need for new umtx calls by using a 64-bit atomic op to > > > > adjust two ints at the same time, yes? Note that since a failing > > > > semaphore op calls into the kernel for the "hard" case, you might in > > > > fact be able to change the ABI without breaking process-shared > > > > semaphores. That is, suppose you left 'has_waiters' as always true > > > > and reused the high bit of count for has_waiters. > > > > > > > > Would old binaries always trap into the kernel? (Not sure they will, > > > > especially the case where an old binary creates the semaphore, a new > > > > binary would have to force has_waiters to true in every sem op, but > > > > even that might not be enough.) > > > > > > I think that everything will break when a binary linked to old and new > > > libcs use the same semaphore. If the new contested bit is set, the old > > > sem_getvalue() will return garbage, the old sem_trywait() will fail even > > > if the real count is greater than 0, the old sem_wait() and > > > sem_timedwait() may spin if the real count is greater than 0 and the old > > > sem_post() will fail with [EOVERFLOW]. > > > > Well, keep in mind you can't have a binary linked to both libc's generally > > (way too many things break if you do that, like passing the result of strdup() > > from one libc to free() of the other, etc.). The real problem case is > > multiple binaries. However, barring truly bizarre cases with people using > > funky LD_LIBRARY_PATH, at least one binary would have to be static to really > > get into trouble as in all but the bizarre cases all binaries will be sharing > > the same libc.so.7. > > > > However, it was more wishful thinking on my part that we could arrange things > > to force old binaries to still work (just slowly) while fixing things cleanly. > > > > > > I think this looks ok, but it's a shame we can't fix it more cleanly. > > > > > > I don't like at all that you have to either force an exotic alignment or > > > use unaligned atomic ops. > > > > > > I saw another interesting sem_ implementation in musl libc. It also uses > > > a separate waiters word (but an int instead of a boolean), and is based > > > on the futex calls (which we already have as wait_uint/wake umtx calls). > > > It looks rather complicated but the musl libc people are usually good at > > > getting this kind of stuff right. I suppose this will also break subtly > > > when interoperating with old libc, and adding new umtx ops will simplify > > > the algorithm. > > > > > > Consideration: just declare mixing process-shared semaphores with > > > sufficiently different libc unsupported, and change SEM_MAGIC to enforce > > > that? (This does not prevent running old binaries, as long as they're > > > dynamically linked to libc and you use a new libc.so.) > > > > This only breaks old static binaries (note that we only recently got pshared > > semaphores in 9.x or so). You could even allow partial compat for those if > > needed by recognizing the old SEM_MAGIC when opening an existing shared > > semaphore (so that compat would work so long as an old binary created the > > semaphore). That compat can even be added retroactively to libc if it turns > > out we need it. I probably think that is ok, but it would be an ABI breakage > > that would need to be documented. One thing that makes this slightly less > > painful is that new-style semaphores in 9.0+ are per-chroot, so if you have > > a jail that runs only old binaries, they will continue to work fine. > > > > I'm cc'ing Konstantin to see what he thinks. > You may also get in trouble if you upgrade libc without reboot, or use > static binary built against newer libc on older host. > > But SEM_MAGIC provides at least protection against mixing incompatible > implementations, and it seems that the checks are enough to ensure > that both bad cases (new/old and old/new) are handled. > > Do we have a real use case when the old and new implementations have to > be mixed ? Was it the same situation when the sem.c -> sem_new.c transition > was made ? I think that the answer to the questions are no/yes, and we > should just change magic and not worry about the issue until real use > case appear. I think it is even worse now. If the application linked against FBSD_1.0 (ksem) semaphores implementation runs on the modern host, it cannot share semaphore with modern binary. Since this is not considered significant problem, we can avoid compat code there as well. > > > > > > Consideration 2: use the new implementation only for process-private > > > semaphores and keep using the old slow one for process-shared > > > semaphores? > > > > This is certainly safe, just requires extra work in the implementation. > > > > -- > > John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140924144519.GL8870>