From owner-freebsd-arch@FreeBSD.ORG Tue Jan 15 09:21:37 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A2039B36; Tue, 15 Jan 2013 09:21:37 +0000 (UTC) (envelope-from theraven@freebsd.org) Received: from theravensnest.org (theraven.freebsd.your.org [216.14.102.27]) by mx1.freebsd.org (Postfix) with ESMTP id 5747D78F; Tue, 15 Jan 2013 09:21:37 +0000 (UTC) Received: from [192.168.0.2] (cpc10-cmbg15-2-0-cust123.5-4.cable.virginmedia.com [86.30.246.124]) (authenticated bits=0) by theravensnest.org (8.14.5/8.14.5) with ESMTP id r0F9LThB017760 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Tue, 15 Jan 2013 09:21:30 GMT (envelope-from theraven@freebsd.org) Subject: Re: Fast sigblock (AKA rtld speedup) Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=us-ascii From: David Chisnall In-Reply-To: <201301141358.33216.jhb@freebsd.org> Date: Tue, 15 Jan 2013 09:21:25 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20130107182235.GA65279@kib.kiev.ua> <20130114174703.GB88220@stack.nl> <201301141358.33216.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1278) Cc: toolchain@freebsd.org, Jilles Tjoelker , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 09:21:37 -0000 On 14 Jan 2013, at 18:58, John Baldwin wrote: > I'm less certain. Note that you can't inline mutex ops until you = expose > the mutexes themselves to userland (that is, making pthread_mutex_t = not > be opaque). This is one of the things that will be required anyway if we wish to = support process-shared mutexes (they've been in POSIX since 1997, so = it's probably getting on for time we did), as the current = mutex-is-a-pointer implementation depends on the virtual address space = of the creator, and so does not work if the mutex is created in a shared = memory segment. That said, even with the current implementation we wouldn't need to = expose the entire mutex structure, just the word that is used as the = fast path. The inline version would be something like: struct pthread_mutex_header { _Atomic(long) lock_word; // other private fields not exposed in header; }; typedef struct pthread_mutex_header *pthread_mutex_t; // Implementation in libthr / libc, which calls into the kernel. int pthread_mutex_lock_slowly(pthread_mutex_t*); inline int pthread_mutex_lock(pthread_mutex_t *mutex) { int desired =3D 0; if (atomic_compare_exchange_weak_explicit(&(*mutex)->lock_word, = &desired, 1, memory_order_acquire, memory_order_relaxed)) return 0; return pthread_mutex_lock_slowly(mutex); } The slow path is only needed when the mutex can't be acquired trivially = in userspace. On x86, the fast path adds 6 extra instructions, including = a branch that can be statically hinted if we want (assume that we won't = be going down the slow path, because a mispredicted branch doesn't add = much to the cost of the syscall if we are). =20 The corresponding saving is that we get to delete a massive pile of = conditionals that we currently have for __is_threaded. We'd also = completely avoid the function call (which is actually two function = calls, as we do some trampoline things in libc) in the fast-path case = for threaded applications. A similar saving is possible with read-write locks and possibly with = condition variables, although our kernel interface for these is = incredibly poorly documented (for once, Linux actually has better = documentation: futexes are very well documented). Looking in umtx.h, it = sort-of exposes inline functions that look like this, but given the = complete lack of documentation, I have no idea how useable they are. =20 David=