From owner-freebsd-arch@FreeBSD.ORG  Tue Jan 15 09:21:37 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A2039B36;
 Tue, 15 Jan 2013 09:21:37 +0000 (UTC)
 (envelope-from theraven@freebsd.org)
Received: from theravensnest.org (theraven.freebsd.your.org [216.14.102.27])
 by mx1.freebsd.org (Postfix) with ESMTP id 5747D78F;
 Tue, 15 Jan 2013 09:21:37 +0000 (UTC)
Received: from [192.168.0.2]
 (cpc10-cmbg15-2-0-cust123.5-4.cable.virginmedia.com [86.30.246.124])
 (authenticated bits=0)
 by theravensnest.org (8.14.5/8.14.5) with ESMTP id r0F9LThB017760
 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO);
 Tue, 15 Jan 2013 09:21:30 GMT (envelope-from theraven@freebsd.org)
Subject: Re: Fast sigblock (AKA rtld speedup)
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=us-ascii
From: David Chisnall <theraven@freebsd.org>
In-Reply-To: <201301141358.33216.jhb@freebsd.org>
Date: Tue, 15 Jan 2013 09:21:25 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <B7D94E53-B39D-4E81-A1E0-0F8FC9ED1CEE@freebsd.org>
References: <20130107182235.GA65279@kib.kiev.ua>
 <20130114174703.GB88220@stack.nl>
 <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org>
 <201301141358.33216.jhb@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
X-Mailer: Apple Mail (2.1278)
Cc: toolchain@freebsd.org, Jilles Tjoelker <jilles@stack.nl>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jan 2013 09:21:37 -0000

On 14 Jan 2013, at 18:58, John Baldwin wrote:

> I'm less certain.  Note that you can't inline mutex ops until you =
expose
> the mutexes themselves to userland (that is, making pthread_mutex_t =
not
> be opaque).

This is one of the things that will be required anyway if we wish to =
support process-shared mutexes (they've been in POSIX since 1997, so =
it's probably getting on for time we did), as the current =
mutex-is-a-pointer implementation depends on the virtual address space =
of the creator, and so does not work if the mutex is created in a shared =
memory segment.

That said, even with the current implementation we wouldn't need to =
expose the entire mutex structure, just the word that is used as the =
fast path.  The inline version would be something like:

struct pthread_mutex_header
{
	_Atomic(long) lock_word;
	// other private fields not exposed in header;
};
typedef struct pthread_mutex_header *pthread_mutex_t;

// Implementation in libthr / libc, which calls into the kernel.
int pthread_mutex_lock_slowly(pthread_mutex_t*);

inline int pthread_mutex_lock(pthread_mutex_t *mutex)
{
	int desired =3D 0;
	if (atomic_compare_exchange_weak_explicit(&(*mutex)->lock_word, =
&desired, 1, memory_order_acquire, memory_order_relaxed))
		return 0;
	return pthread_mutex_lock_slowly(mutex);
}

The slow path is only needed when the mutex can't be acquired trivially =
in userspace. On x86, the fast path adds 6 extra instructions, including =
a branch that can be statically hinted if we want (assume that we won't =
be going down the slow path, because a mispredicted branch doesn't add =
much to the cost of the syscall if we are). =20

The corresponding saving is that we get to delete a massive pile of =
conditionals that we currently have for __is_threaded.  We'd also =
completely avoid the function call (which is actually two function =
calls, as we do some trampoline things in libc) in the fast-path case =
for threaded applications.

A similar saving is possible with read-write locks and possibly with =
condition variables, although our kernel interface for these is =
incredibly poorly documented (for once, Linux actually has better =
documentation: futexes are very well documented).  Looking in umtx.h, it =
sort-of exposes inline functions that look like this, but given the =
complete lack of documentation, I have no idea how useable they are. =20

David=