Date: Tue, 15 Jan 2013 09:21:25 +0000 From: David Chisnall <theraven@freebsd.org> To: John Baldwin <jhb@freebsd.org> Cc: toolchain@freebsd.org, Jilles Tjoelker <jilles@stack.nl>, freebsd-arch@freebsd.org Subject: Re: Fast sigblock (AKA rtld speedup) Message-ID: <B7D94E53-B39D-4E81-A1E0-0F8FC9ED1CEE@freebsd.org> In-Reply-To: <201301141358.33216.jhb@freebsd.org> References: <20130107182235.GA65279@kib.kiev.ua> <20130114174703.GB88220@stack.nl> <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org> <201301141358.33216.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 14 Jan 2013, at 18:58, John Baldwin wrote: > I'm less certain. Note that you can't inline mutex ops until you = expose > the mutexes themselves to userland (that is, making pthread_mutex_t = not > be opaque). This is one of the things that will be required anyway if we wish to = support process-shared mutexes (they've been in POSIX since 1997, so = it's probably getting on for time we did), as the current = mutex-is-a-pointer implementation depends on the virtual address space = of the creator, and so does not work if the mutex is created in a shared = memory segment. That said, even with the current implementation we wouldn't need to = expose the entire mutex structure, just the word that is used as the = fast path. The inline version would be something like: struct pthread_mutex_header { _Atomic(long) lock_word; // other private fields not exposed in header; }; typedef struct pthread_mutex_header *pthread_mutex_t; // Implementation in libthr / libc, which calls into the kernel. int pthread_mutex_lock_slowly(pthread_mutex_t*); inline int pthread_mutex_lock(pthread_mutex_t *mutex) { int desired =3D 0; if (atomic_compare_exchange_weak_explicit(&(*mutex)->lock_word, = &desired, 1, memory_order_acquire, memory_order_relaxed)) return 0; return pthread_mutex_lock_slowly(mutex); } The slow path is only needed when the mutex can't be acquired trivially = in userspace. On x86, the fast path adds 6 extra instructions, including = a branch that can be statically hinted if we want (assume that we won't = be going down the slow path, because a mispredicted branch doesn't add = much to the cost of the syscall if we are). =20 The corresponding saving is that we get to delete a massive pile of = conditionals that we currently have for __is_threaded. We'd also = completely avoid the function call (which is actually two function = calls, as we do some trampoline things in libc) in the fast-path case = for threaded applications. A similar saving is possible with read-write locks and possibly with = condition variables, although our kernel interface for these is = incredibly poorly documented (for once, Linux actually has better = documentation: futexes are very well documented). Looking in umtx.h, it = sort-of exposes inline functions that look like this, but given the = complete lack of documentation, I have no idea how useable they are. =20 David=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B7D94E53-B39D-4E81-A1E0-0F8FC9ED1CEE>