From owner-freebsd-current@FreeBSD.ORG Fri Mar 27 21:41:00 2015 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A7D50645; Fri, 27 Mar 2015 21:41:00 +0000 (UTC) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B3EAB8D; Fri, 27 Mar 2015 21:41:00 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id CECDD358C68; Fri, 27 Mar 2015 22:40:57 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id BD8AB28494; Fri, 27 Mar 2015 22:40:57 +0100 (CET) Date: Fri, 27 Mar 2015 22:40:57 +0100 From: Jilles Tjoelker To: Eric van Gyzen Subject: Re: SSE in libthr Message-ID: <20150327214057.GA3766@stack.nl> References: <5515AED9.8040408@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5515AED9.8040408@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: current@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2015 21:41:00 -0000 On Fri, Mar 27, 2015 at 03:26:17PM -0400, Eric van Gyzen wrote: > In a nutshell: > Clang emits SSE instructions on amd64 in the common path of > pthread_mutex_unlock. This reduces performance by a non-trivial > amount. I'd like to disable SSE in libthr. How about saving and restoring the FPU/SSE state eagerly instead of the current CR0.TS-based lazy method? There is overhead associated with #NM exception handling (fpudna) which is not worth it if FPU/SSE are used often. This would apply to userland threads only; kernel threads normally do not use FPU/SSE and handle the FPU/SSE state manually if they do. There is performance improvement potential in using SSE for optimizing string functions, for example. Even a simple SSE2 strlen easily outperforms the already optimized lib/libc/string/strlen.c in a microbenchmark, and many other string functions are slow byte-at-a-time implementations. -- Jilles Tjoelker