Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Jan 2013 09:21:25 +0000
From:      David Chisnall <theraven@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        toolchain@freebsd.org, Jilles Tjoelker <jilles@stack.nl>, freebsd-arch@freebsd.org
Subject:   Re: Fast sigblock (AKA rtld speedup)
Message-ID:  <B7D94E53-B39D-4E81-A1E0-0F8FC9ED1CEE@freebsd.org>
In-Reply-To: <201301141358.33216.jhb@freebsd.org>
References:  <20130107182235.GA65279@kib.kiev.ua> <20130114174703.GB88220@stack.nl> <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org> <201301141358.33216.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14 Jan 2013, at 18:58, John Baldwin wrote:

> I'm less certain.  Note that you can't inline mutex ops until you =
expose
> the mutexes themselves to userland (that is, making pthread_mutex_t =
not
> be opaque).

This is one of the things that will be required anyway if we wish to =
support process-shared mutexes (they've been in POSIX since 1997, so =
it's probably getting on for time we did), as the current =
mutex-is-a-pointer implementation depends on the virtual address space =
of the creator, and so does not work if the mutex is created in a shared =
memory segment.

That said, even with the current implementation we wouldn't need to =
expose the entire mutex structure, just the word that is used as the =
fast path.  The inline version would be something like:

struct pthread_mutex_header
{
	_Atomic(long) lock_word;
	// other private fields not exposed in header;
};
typedef struct pthread_mutex_header *pthread_mutex_t;

// Implementation in libthr / libc, which calls into the kernel.
int pthread_mutex_lock_slowly(pthread_mutex_t*);

inline int pthread_mutex_lock(pthread_mutex_t *mutex)
{
	int desired =3D 0;
	if (atomic_compare_exchange_weak_explicit(&(*mutex)->lock_word, =
&desired, 1, memory_order_acquire, memory_order_relaxed))
		return 0;
	return pthread_mutex_lock_slowly(mutex);
}

The slow path is only needed when the mutex can't be acquired trivially =
in userspace. On x86, the fast path adds 6 extra instructions, including =
a branch that can be statically hinted if we want (assume that we won't =
be going down the slow path, because a mispredicted branch doesn't add =
much to the cost of the syscall if we are). =20

The corresponding saving is that we get to delete a massive pile of =
conditionals that we currently have for __is_threaded.  We'd also =
completely avoid the function call (which is actually two function =
calls, as we do some trampoline things in libc) in the fast-path case =
for threaded applications.

A similar saving is possible with read-write locks and possibly with =
condition variables, although our kernel interface for these is =
incredibly poorly documented (for once, Linux actually has better =
documentation: futexes are very well documented).  Looking in umtx.h, it =
sort-of exposes inline functions that look like this, but given the =
complete lack of documentation, I have no idea how useable they are. =20

David=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B7D94E53-B39D-4E81-A1E0-0F8FC9ED1CEE>