From owner-freebsd-current@FreeBSD.ORG Fri Mar 27 20:49:08 2015 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 49FCA847; Fri, 27 Mar 2015 20:49:08 +0000 (UTC) Received: from st11p02mm-asmtp002.mac.com (st11p02mm-asmtp002.mac.com [17.172.220.237]) (using TLSv1.2 with cipher DHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1FCAD353; Fri, 27 Mar 2015 20:49:07 +0000 (UTC) Received: from fukuyama.hsd1.ca.comcast.net (unknown [73.162.13.215]) by st11p02mm-asmtp002.mac.com (Oracle Communications Messaging Server 7.0.5.35.0 64bit (built Dec 4 2014)) with ESMTPSA id <0NLW00A4D1TRWC50@st11p02mm-asmtp002.mac.com>; Fri, 27 Mar 2015 20:49:05 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2015-03-27_06:2015-03-27,2015-03-27,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1412110000 definitions=main-1503270200 Content-type: text/plain; charset=us-ascii MIME-version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: SSE in libthr From: Rui Paulo In-reply-to: <5515AED9.8040408@FreeBSD.org> Date: Fri, 27 Mar 2015 13:49:03 -0700 Content-transfer-encoding: quoted-printable Message-id: <3A96AAEC-9C1C-444E-9A73-3CD2AED33116@me.com> References: <5515AED9.8040408@FreeBSD.org> To: Eric van Gyzen X-Mailer: Apple Mail (2.2070.6) X-Mailman-Approved-At: Fri, 27 Mar 2015 21:30:18 +0000 Cc: current@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2015 20:49:08 -0000 On Mar 27, 2015, at 12:26, Eric van Gyzen wrote: >=20 > In a nutshell: >=20 > Clang emits SSE instructions on amd64 in the common path of > pthread_mutex_unlock. This reduces performance by a non-trivial = amount. I'd > like to disable SSE in libthr. >=20 > In more detail: >=20 > In libthr/thread/thr_mutex.c, we find the following: >=20 > #define MUTEX_INIT_LINK(m) do { \ > (m)->m_qe.tqe_prev =3D NULL; \ > (m)->m_qe.tqe_next =3D NULL; \ > } while (0) >=20 > In 9.1, clang 3.1 emits two ordinary mov instructions: >=20 > movq $0x0,0x8(%rax) > movq $0x0,(%rax) >=20 > Since 10.0 and clang 3.3, clang emits these SSE instructions: >=20 > xorps %xmm0,%xmm0 > movups %xmm0,(%rax) >=20 > Although these look harmless enough, using the FPU can reduce = performance by > incurring extra overhead due to context-switching the FPU state. >=20 > As I mentioned, this code is used in the common path of = pthread_mutex_unlock. I > have a simple test program that creates four threads, all contending = for a > single mutex, and measures the total number of lock acquisitions over = several > seconds. When libthr is built with SSE, as is current, I get around = 53 million > locks in 5 seconds. Without SSE, I get around 60 million (13% more). = DTrace > shows around 790,000 calls to fpudna versus 10 calls. There could be = other > factors involved, but I presume that the FPU context switches account = for most > of the change in performance. >=20 > Even when I add some SSE usage in the application--incidentally, these = same > instructions--building libthr without SSE improves performance from = 53.5 million > to 55.8 million (4.3%). >=20 > In the real-world application where I first noticed this, performance = improves > by 3-5%. >=20 > I would appreciate your thoughts and feedback. The proposed patch is = below. >=20 > Eric >=20 >=20 >=20 > Index: base/head/lib/libthr/arch/amd64/Makefile.inc > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- base/head/lib/libthr/arch/amd64/Makefile.inc (revision = 280703) > +++ base/head/lib/libthr/arch/amd64/Makefile.inc (working copy) > @@ -1,3 +1,8 @@ > #$FreeBSD$ >=20 > SRCS+=3D _umtx_op_err.S > + > +# Using SSE incurs extra overhead per context switch, > +# which measurably impacts performance when the application > +# does not otherwise use FP/SSE. > +CFLAGS+=3D-mno-sse Good catch! Regarding your patch, I think we should disable even more, if possible. = How about: CFLAGS+=3D -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -- Rui Paulo