Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jan 2013 18:24:04 +0000
From:      David Chisnall <theraven@FreeBSD.org>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        toolchain@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject:   Re: Fast sigblock (AKA rtld speedup)
Message-ID:  <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org>
In-Reply-To: <20130114174703.GB88220@stack.nl>
References:  <20130107182235.GA65279@kib.kiev.ua> <20130112053147.GH2561@kib.kiev.ua> <20130112162547.GA54954@stack.nl> <201301141106.07976.jhb@freebsd.org> <20130114174703.GB88220@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_70750EDA-D886-486C-9220-8B45D4AB7DD4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 14 Jan 2013, at 17:47, Jilles Tjoelker wrote:

> The code which does that check is actually under contrib/gcc. Problem
> is, they designed __gthread_active_p() to distinguish threaded and
> unthreaded programming environments -- it must be known in advance and
> cannot be changed later. The code for the unthreaded environment then
> takes advantage of this by not even allocating memory for mutexes in
> some cases.

It's worth taking a step back and asking why this code exists at all, =
and the main reason is that acquiring a mutex used to be really =
expensive.  It still is on some fruit-flavoured operating systems, but =
elsewhere it's a single atomic operation in the uncontended case, and in =
that case the cache line will already be exclusively owned by the =
calling core in single-threaded code. =20

I would much rather that we followed the example of Solaris and made the =
multithreaded case fast and the default than keep piling on hacks that =
allow code to shave off a few clock cycles in the single-threaded case.  =
In particular, the popularity of multicore systems means that it is =
increasingly rare for code to be both single threaded and performance =
critical, so this seems like misplaced optimisation.

I strongly suspect that making it possible to inline the uncontended =
lock case for a pthread mutex and eliminating all of the branches on =
__isthreaded would give us a net speedup in both single and =
multithreaded cases.

> This __gthread_active_p() thing is another barrier to bringing in a
> threaded plugin in an unthreaded application. Ports people spend a =
fair
> amount of time adding -pthread flags to things (such as perl) to work
> around this.


This and the similar checks in libc cause a lot of pain, and it seems =
that the correct fix is ensuring that the performance penalty for =
linking libthr is so small that there is no point in avoiding it.

David


--Apple-Mail=_70750EDA-D886-486C-9220-8B45D4AB7DD4
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)

iQIcBAEBAgAGBQJQ9E1FAAoJEKx65DEEsqIdUgcP/RQ4sF0gYsuCGiWUlmiioKnE
3ZQF76ieurN7Hbmq0WNs3eAHljFzQMRPrsgqQVcbyTNPuIuSX4zIdTGLLFwBijf0
X2R0nO6e7sHTYKtCcHmXFoH7DCoQSEG88F1q7zRA1RlvOF0hXDXHEYrSCpWeBMnC
5SwcYMgsZ5eXX9a5tvsUeq2/GyDcPYEkVhq3ueZRmVxIXoaL5Eq3qZ4hReJCLo/1
AnB+/c0dAMJQE6td8gdn7+8EcbHeAblGvpRJFYaNT56WiAVbu+ZOB1l2wNRzMM3e
mYsg72pfUxcqb6WWwgk4pXqPQyIMT9pHCwden2rrEpzk7qHFQUV3odVyo2SXtA44
xMWBs2d2a8fmMRCW6wrtrpb1jlPo9W4KmQWpF+4Kaq2P8DuN0ljyTRSC5PQqM4ms
saFYl6OOtRFPzD/6RUddklQIi2poBhVp6hAfA2qxq0otMN1ZmkpTsRtsNZltXbpp
9fyeHpc2IsBx9uM7ND2b5FQmdXKq1Zs0sF2HC3uhH2Q7F2r39TuM/0m5eayyJosZ
bWExLzQmq5gpR6guEEV4pdgye33eCL1TvVgRGOPxmpenydqEyFiflcu16bh5wRU2
DkMJGe6r9OBKqnvNOrlrtE9P16906C9XL9QwUHfnjg440/WAFW2i44zj0U+PE3Bf
+GwlZRaF3NgeTX7i7nmH
=ADYu
-----END PGP SIGNATURE-----

--Apple-Mail=_70750EDA-D886-486C-9220-8B45D4AB7DD4--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D6772A0E-FBA4-4168-B152-7E7694720A16>