From owner-freebsd-current@FreeBSD.ORG Fri Mar 27 21:44:58 2015 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CC0AA7D2; Fri, 27 Mar 2015 21:44:58 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6D865BC3; Fri, 27 Mar 2015 21:44:58 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t2RLirGC072587 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 27 Mar 2015 23:44:53 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t2RLirGC072587 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t2RLiq1P072586; Fri, 27 Mar 2015 23:44:52 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 27 Mar 2015 23:44:52 +0200 From: Konstantin Belousov To: Rui Paulo Subject: Re: SSE in libthr Message-ID: <20150327214452.GR2379@kib.kiev.ua> References: <5515AED9.8040408@FreeBSD.org> <3A96AAEC-9C1C-444E-9A73-3CD2AED33116@me.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3A96AAEC-9C1C-444E-9A73-3CD2AED33116@me.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: Eric van Gyzen , current@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2015 21:44:59 -0000 On Fri, Mar 27, 2015 at 01:49:03PM -0700, Rui Paulo wrote: > On Mar 27, 2015, at 12:26, Eric van Gyzen wrote: > > > > In a nutshell: > > > > Clang emits SSE instructions on amd64 in the common path of > > pthread_mutex_unlock. This reduces performance by a non-trivial amount. I'd > > like to disable SSE in libthr. > > > > In more detail: > > > > In libthr/thread/thr_mutex.c, we find the following: > > > > #define MUTEX_INIT_LINK(m) do { \ > > (m)->m_qe.tqe_prev = NULL; \ > > (m)->m_qe.tqe_next = NULL; \ > > } while (0) > > > > In 9.1, clang 3.1 emits two ordinary mov instructions: > > > > movq $0x0,0x8(%rax) > > movq $0x0,(%rax) > > > > Since 10.0 and clang 3.3, clang emits these SSE instructions: > > > > xorps %xmm0,%xmm0 > > movups %xmm0,(%rax) > > > > Although these look harmless enough, using the FPU can reduce performance by > > incurring extra overhead due to context-switching the FPU state. > > > > As I mentioned, this code is used in the common path of pthread_mutex_unlock. I > > have a simple test program that creates four threads, all contending for a > > single mutex, and measures the total number of lock acquisitions over several > > seconds. When libthr is built with SSE, as is current, I get around 53 million > > locks in 5 seconds. Without SSE, I get around 60 million (13% more). DTrace > > shows around 790,000 calls to fpudna versus 10 calls. There could be other > > factors involved, but I presume that the FPU context switches account for most > > of the change in performance. > > > > Even when I add some SSE usage in the application--incidentally, these same > > instructions--building libthr without SSE improves performance from 53.5 million > > to 55.8 million (4.3%). > > > > In the real-world application where I first noticed this, performance improves > > by 3-5%. > > > > I would appreciate your thoughts and feedback. The proposed patch is below. > > > > Eric > > > > > > > > Index: base/head/lib/libthr/arch/amd64/Makefile.inc > > =================================================================== > > --- base/head/lib/libthr/arch/amd64/Makefile.inc (revision 280703) > > +++ base/head/lib/libthr/arch/amd64/Makefile.inc (working copy) > > @@ -1,3 +1,8 @@ > > #$FreeBSD$ > > > > SRCS+= _umtx_op_err.S > > + > > +# Using SSE incurs extra overhead per context switch, > > +# which measurably impacts performance when the application > > +# does not otherwise use FP/SSE. > > +CFLAGS+=-mno-sse > > Good catch! > > Regarding your patch, I think we should disable even more, if possible. How about: > > CFLAGS+= -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 I think so. Also, this should be done for libc as well, both on i386 and amd64. I am not sure, should compiler-rt be included into the set ?