From owner-freebsd-questions@FreeBSD.ORG Sat May 23 20:39:37 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8D3C1065670 for ; Sat, 23 May 2009 20:39:37 +0000 (UTC) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [IPv6:2001:4070:101:2::1]) by mx1.freebsd.org (Postfix) with ESMTP id 04B1E8FC0A for ; Sat, 23 May 2009 20:39:35 +0000 (UTC) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from wojtek.tensor.gdynia.pl (localhost [IPv6:::1]) by wojtek.tensor.gdynia.pl (8.14.3/8.14.3) with ESMTP id n4NKdQXR027170; Sat, 23 May 2009 22:39:26 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from localhost (wojtek@localhost) by wojtek.tensor.gdynia.pl (8.14.3/8.14.3/Submit) with ESMTP id n4NKdQAJ027167; Sat, 23 May 2009 22:39:26 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Date: Sat, 23 May 2009 22:39:26 +0200 (CEST) From: Wojciech Puchar To: Yuri In-Reply-To: <4A1853AA.9060407@rawbw.com> Message-ID: References: <4A1853AA.9060407@rawbw.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-questions@freebsd.org Subject: Re: Why build's user CPU on 4-CPU machine with hyper-threading always higher with -j 8 compared to with -j 4? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 May 2009 20:39:38 -0000 > I noticed that the same exact build on i7-920 (4 CPUs) consumes ~15% more > user CPU when run with -j 8 compared to -j 4. > > Hyper-threading is enabled so top shows 8 CPUs. > > Why would user time be higher in a hyper-threaded run? because it doesn't count actual instruction executed but - as name suggest - time. with -j 8 it sums time of 8 pseudo-processors single pseudo-processor ("half" of single core) is slower than full-processor. The whole trick with hyperthreading is that it's less than half slower, as one "pseudo-processor" become full-processor every time that second "pseudo-processor" is stalled on memory access. today memory access to DRAM means over hundred of CPU cycles. out of order execution can execute other instruction to some extent, but only "some". even L3 cache access costs >>10 CPU cycles