Date: Mon, 14 Jan 2013 18:24:04 +0000 From: David Chisnall <theraven@FreeBSD.org> To: Jilles Tjoelker <jilles@stack.nl> Cc: toolchain@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Fast sigblock (AKA rtld speedup) Message-ID: <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org> In-Reply-To: <20130114174703.GB88220@stack.nl> References: <20130107182235.GA65279@kib.kiev.ua> <20130112053147.GH2561@kib.kiev.ua> <20130112162547.GA54954@stack.nl> <201301141106.07976.jhb@freebsd.org> <20130114174703.GB88220@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_70750EDA-D886-486C-9220-8B45D4AB7DD4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 14 Jan 2013, at 17:47, Jilles Tjoelker wrote: > The code which does that check is actually under contrib/gcc. Problem > is, they designed __gthread_active_p() to distinguish threaded and > unthreaded programming environments -- it must be known in advance and > cannot be changed later. The code for the unthreaded environment then > takes advantage of this by not even allocating memory for mutexes in > some cases. It's worth taking a step back and asking why this code exists at all, = and the main reason is that acquiring a mutex used to be really = expensive. It still is on some fruit-flavoured operating systems, but = elsewhere it's a single atomic operation in the uncontended case, and in = that case the cache line will already be exclusively owned by the = calling core in single-threaded code. =20 I would much rather that we followed the example of Solaris and made the = multithreaded case fast and the default than keep piling on hacks that = allow code to shave off a few clock cycles in the single-threaded case. = In particular, the popularity of multicore systems means that it is = increasingly rare for code to be both single threaded and performance = critical, so this seems like misplaced optimisation. I strongly suspect that making it possible to inline the uncontended = lock case for a pthread mutex and eliminating all of the branches on = __isthreaded would give us a net speedup in both single and = multithreaded cases. > This __gthread_active_p() thing is another barrier to bringing in a > threaded plugin in an unthreaded application. Ports people spend a = fair > amount of time adding -pthread flags to things (such as perl) to work > around this. This and the similar checks in libc cause a lot of pain, and it seems = that the correct fix is ensuring that the performance penalty for = linking libthr is so small that there is no point in avoiding it. David --Apple-Mail=_70750EDA-D886-486C-9220-8B45D4AB7DD4 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) iQIcBAEBAgAGBQJQ9E1FAAoJEKx65DEEsqIdUgcP/RQ4sF0gYsuCGiWUlmiioKnE 3ZQF76ieurN7Hbmq0WNs3eAHljFzQMRPrsgqQVcbyTNPuIuSX4zIdTGLLFwBijf0 X2R0nO6e7sHTYKtCcHmXFoH7DCoQSEG88F1q7zRA1RlvOF0hXDXHEYrSCpWeBMnC 5SwcYMgsZ5eXX9a5tvsUeq2/GyDcPYEkVhq3ueZRmVxIXoaL5Eq3qZ4hReJCLo/1 AnB+/c0dAMJQE6td8gdn7+8EcbHeAblGvpRJFYaNT56WiAVbu+ZOB1l2wNRzMM3e mYsg72pfUxcqb6WWwgk4pXqPQyIMT9pHCwden2rrEpzk7qHFQUV3odVyo2SXtA44 xMWBs2d2a8fmMRCW6wrtrpb1jlPo9W4KmQWpF+4Kaq2P8DuN0ljyTRSC5PQqM4ms saFYl6OOtRFPzD/6RUddklQIi2poBhVp6hAfA2qxq0otMN1ZmkpTsRtsNZltXbpp 9fyeHpc2IsBx9uM7ND2b5FQmdXKq1Zs0sF2HC3uhH2Q7F2r39TuM/0m5eayyJosZ bWExLzQmq5gpR6guEEV4pdgye33eCL1TvVgRGOPxmpenydqEyFiflcu16bh5wRU2 DkMJGe6r9OBKqnvNOrlrtE9P16906C9XL9QwUHfnjg440/WAFW2i44zj0U+PE3Bf +GwlZRaF3NgeTX7i7nmH =ADYu -----END PGP SIGNATURE----- --Apple-Mail=_70750EDA-D886-486C-9220-8B45D4AB7DD4--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D6772A0E-FBA4-4168-B152-7E7694720A16>