From owner-freebsd-questions@FreeBSD.ORG Mon Dec 8 14:40:33 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59E5414D; Mon, 8 Dec 2014 14:40:33 +0000 (UTC) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DADB0B36; Mon, 8 Dec 2014 14:40:32 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.82) with esmtp (envelope-from ) id <1XxzUE-000eeK-A0>; Mon, 08 Dec 2014 15:40:30 +0100 Received: from p578a69f9.dip0.t-ipconnect.de ([87.138.105.249] helo=prometheus) by inpost2.zedat.fu-berlin.de (Exim 4.82) with esmtpsa (envelope-from ) id <1XxzUE-003Nid-4l>; Mon, 08 Dec 2014 15:40:30 +0100 Date: Mon, 8 Dec 2014 15:39:25 +0100 From: "O. Hartmann" To: grarpamp Subject: Re: HyperThreading on Intel Xeon Haswell, a benefit? Message-ID: <20141208153925.5df90587@prometheus> In-Reply-To: References: Organization: FU Berlin X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Originating-IP: 87.138.105.249 Cc: freebsd-performance@freebsd.org, freebsd-smp@freebsd.org, freebsd-questions@freebsd.org, freebsd-hardware@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 14:40:33 -0000 On Mon, 8 Dec 2014 04:43:05 -0500 grarpamp wrote: > HyperThreading on Intel Xeon Haswell, a benefit? >=20 > What bits of FreeBSD are aware and can take proper advantage of > Intel HTT, such as its thread/process schedulers (sched-BSD/ULE/...), > etc? >=20 > What system/app loads are, or are not, likely to benefit with today's > HyperThreading CPU's? Kernel (ZFS/crypto/net/...) vs. Userland > (apps)? >=20 > Does anyone have performance stats for this current class of CPU > to post comparing HT (enabled and disabled) while using more than > four processes/threads in parallel? >=20 > For instance, these two Intel Xeon Haswell four core CPU's are > identical except for HT [1] (e3-1226v3 and e3-1246v3), and you > can always turn HT off for testing. > http://ark.intel.com/compare/80917,80916 >=20 > There are some Core i3/i5/i7 Haswell parts with HT as well. > http://ark.intel.com/Search/Advanced?s=3Dt&ECCMemory=3Dtrue&VTD=3Dtrue&AE= STech=3Dtrue >=20 > There don't seem to be many reviews of Xeon processors, let alone > HT. And most Unix talk of HT seems dated by at least a few years > and a couple processor generations. >=20 > Also, was the HT cache leak security issue from a decade ago ever > fixed in hardware? > "Cache missing for fun and profit" > http://www.daemonology.net/papers/ >=20 > Being unsure of the best list, please direct replies to whichever > is good. Thanks. >=20 > [1] Plus 200MHz/6% clock per core and $59/27% market price bumps, > but this thread is about whether or not there is any benefit to HT > in current Intel CPU's such as Haswell, how much of one, and where. > Once that is determined, then you can factor in other parameters > like these to see if it's an overall value. > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" Hello. Well, I have a very narrow and some sort of naive experience, so be warned. =46rom my experience, mostly compiling FreeBSD sources from scratch (deleted /usr/obj, no sophisticated caching subsystems used), compiling world and kernel with as many threads allowed as possible (using value of possible threads via PARA=3D`sysctl -n hw.ncpu` and use then $PARA as variable for "make -j${PARA} ..."), a dual core, 4-thread CPU at 3.3 GHz takes ~ 60 minutes to build world, the same as a 4-core castrated i3 with disabled SMT. Switching off SMT on the dual core results in roughly 90 - 100 minutes compile time in my case, depending on the average load of the box while compiling. So, for the INTEGER performance, I see some real benefits of SMT. The picture is somehow different for the floating point performance. Using SMT in some FPU heavy caclulations on Sandy- and Ivy-Bridge CPUs (Haswell is not available as XEON to me at this very moment), I see only 10% - a max. of 25% (roughly estimated on some crude manually timed calculations!). There is some sligt benefit, even better with most recent Ivy-Bridge than Sandy-Bridge and bot latter seem to be superior in that matter to some Westmere 6-Core XEONS we used to use a couple of years ago (this may be related to some other architectural design improvements other than SMT, like the ring bus introduced in Sandy Bridge and improved in Ivy Bridge and maybe Haswell). In earlier times (pre Sandy-Bridge era) there were issues were it would be beneficial switching off SMT for heavy FPU load in some BLAS/LAPACK based benchmark scenarios, but this knowledge is years ago with older P4 designs and early Core i7. I lost track of that.=20 To make it short: I would highly recommend using/purchasing SMT capable CPUs since there is a benefit in performance. But at the end the performance gain has to meet the costs of a SMT capable XEON. As far as I know, most of the "value" XEONs do have SMT by default. There are some disadvantages regarding the amount of memory the kernel has to consume for each core (logical and/or physical) found, so systems with small amounts of physical RAM (< 8 GiB) could run into disadvantageous situations - if I'm not wrong. But for all FreeBSD users considering using ZFS fro professional/semiprofessional usage, 8 GiB at least is a must, otherwise the ZFS system is crippling performance, not SMT. oh =20