Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Jan 2013 17:25:47 +0100
From:      Jilles Tjoelker <jilles@stack.nl>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        arch@freebsd.org, toolchain@freebsd.org
Subject:   Re: Fast sigblock (AKA rtld speedup)
Message-ID:  <20130112162547.GA54954@stack.nl>
In-Reply-To: <20130112053147.GH2561@kib.kiev.ua>
References:  <20130107182235.GA65279@kib.kiev.ua> <20130111095459.GZ2561@kib.kiev.ua> <50EFE830.3030500@freebsd.org> <20130111204938.GD2561@kib.kiev.ua> <20130111232906.GA29017@stack.nl> <20130112053147.GH2561@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jan 12, 2013 at 07:31:48AM +0200, Konstantin Belousov wrote:
> On Sat, Jan 12, 2013 at 12:29:06AM +0100, Jilles Tjoelker wrote:
> > On Fri, Jan 11, 2013 at 10:49:38PM +0200, Konstantin Belousov wrote:
> > > http://people.freebsd.org/~kib/misc/rtld-sigblock.3.patch

> > The new fields td_sigblock_ptr and td_sigblock_val are in the part that
> > is zeroed for new threads, while the code in rtld appears to expect them
> > to be copied (on fork, vfork and pthread_create). The fields are
> > correctly zeroed on exec.
> Thank you for noting this. Should be fixed in
> http://people.freebsd.org/~kib/misc/rtld-sigblock.4.patch

> > Sharing the magic variable between threads means that one thread holding
> > an rtld lock will block signals for all other threads as well. This is
> > different from how the normal signal mask works but I don't know whether
> > it may break things. However, the patch depends on it in some way
> > because sigtd() does not know about fast sigblock and will therefore
> > happily select a thread that is fast-sigblocking to handle a signal
> > directed at the process.

> Hm, I do not quite follow, at least not the first part of the paragraph.

> The fast sigblock pointer is per-thread, so it is not shared in the kernel.
> Regardless of the kernel side, rtld is only supposed to use the fast
> block in the default implementation of the rtld locks, which are overriden
> by the libthr implementation on libthr initialization. There is also an
> explicit hand-off from the default locks to the external (libthr), and
> rtld cares to turn off fast sigblock mechanism when the handoff is
> performed.

> The selection of the thread for the delivery of signal in the mt process
> should not matter then, due to the mechanism not supposed to be used
> in the mt process.

OK, you are right. If multiple threads exist, the patched code disables
fast sigblock and the current unpatched code already postpones signal
handlers without sigprocmask() syscalls.

This suggests a different rather simpler change. Libthr can always use
its rtld lock implementation instead of only when multiple threads
exist. This avoids the sigprocmask() syscalls and should not be much
slower than the default implementation in rtld because that also
contains atomic operations (both the unpatched and the patched version).
People that care about performance of exceptions can then link in libthr
(in many cases, it is already linked in to allow for (the possibility
of) threading).

I have tested this and exceptions were indeed more than twice as fast in
my test program if I created an extra thread doing nothing than in the
fully single-threaded version (with or without libthr).

With that, I think fast sigblock is too much code and complication for a
niche case.

Most of the extra atomics in multi-threaded applications are conditional
on __isthreaded (or can be made so); therefore, performance loss from
linking in libthr should be negligible in most cases.

> > Although libthr's postpone approach is somewhat ugly, it does not depend
> > on any non-standard kernel features and does not delay the default
> > action. Would it be possible to move that code to libc to make things
> > easier for rtld? It looks like this requires teaching libc about various
> > threading concepts, though.

> The concern there is that rtld would need to have a tight coupling
> with libc, and possibly with libthr too.

Libc could call _rtld_thread_init() during initialization, somewhat like
libthr does now. The problems here are in libc and libthr, not in rtld.
Before libc is ready, the current code can be used; if libc is always
initialized first, the sigprocmask() calls can be removed because there
is no way a signal handler can be installed that early.

-- 
Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130112162547.GA54954>