From owner-freebsd-current@FreeBSD.ORG Sat Mar 28 07:34:17 2015 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E95A75EC; Sat, 28 Mar 2015 07:34:17 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 58C3FCED; Sat, 28 Mar 2015 07:34:17 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t2S7YBxH006695 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 28 Mar 2015 09:34:11 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t2S7YBxH006695 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t2S7YA4g006694; Sat, 28 Mar 2015 09:34:10 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 28 Mar 2015 09:34:10 +0200 From: Konstantin Belousov To: Jilles Tjoelker Subject: Re: SSE in libthr Message-ID: <20150328073410.GS2379@kib.kiev.ua> References: <5515AED9.8040408@FreeBSD.org> <20150327214057.GA3766@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150327214057.GA3766@stack.nl> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: Eric van Gyzen , current@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Mar 2015 07:34:18 -0000 On Fri, Mar 27, 2015 at 10:40:57PM +0100, Jilles Tjoelker wrote: > On Fri, Mar 27, 2015 at 03:26:17PM -0400, Eric van Gyzen wrote: > > In a nutshell: > > > Clang emits SSE instructions on amd64 in the common path of > > pthread_mutex_unlock. This reduces performance by a non-trivial > > amount. I'd like to disable SSE in libthr. > > How about saving and restoring the FPU/SSE state eagerly instead of the > current CR0.TS-based lazy method? There is overhead associated with #NM > exception handling (fpudna) which is not worth it if FPU/SSE are used > often. This would apply to userland threads only; kernel threads > normally do not use FPU/SSE and handle the FPU/SSE state manually if > they do. First, we have no choice but saving the FPU context when a thread is switched from. It is not practical to try to keep the state in the hardware, since fetching it to other core is too troublesome. Second, the biggest overhead of #NM is the reading of FPU context from memory (or cache), not the handler itself. The save area for SSE-capable machines, i.e. all amd64, is ~400 bytes, and XSAVEOPT does not help much for reading of legacy FPU + XMM state. It does help for YMM. That said, your proposal would force all threads to pay higher cost at the context switch time, increasing latency. > > There is performance improvement potential in using SSE for optimizing > string functions, for example. Even a simple SSE2 strlen easily > outperforms the already optimized lib/libc/string/strlen.c in a > microbenchmark, and many other string functions are slow byte-at-a-time > implementations. If the program does a lot of work with FPU between switches, the cost is obviously mitigated. Note that even for the worst case of the reported microbenchmark, the measured overhead is ~10-15%. So if string ops are indeed take significant share of the program time, the FPU #NM handling cost should be very low even with the current scheme.