From owner-freebsd-stable@FreeBSD.ORG Thu Apr 15 01:20:06 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C3FD1065670; Thu, 15 Apr 2010 01:20:06 +0000 (UTC) (envelope-from maho.nakata@gmail.com) Received: from mail-yw0-f193.google.com (mail-yw0-f193.google.com [209.85.211.193]) by mx1.freebsd.org (Postfix) with ESMTP id 870EF8FC13; Thu, 15 Apr 2010 01:20:05 +0000 (UTC) Received: by ywh31 with SMTP id 31so405132ywh.3 for ; Wed, 14 Apr 2010 18:20:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:date:message-id:to:cc :subject:from:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=ETrUah6mcrEtmdiWfM0ljs0NRNFaAuorI9HclJEOblA=; b=m05nzRksq6ZgW2UDXv6TvOPWBuYCSz+HnLNo+gpVV5d+CVed8apFqHExySt1gA7ptt Z9vz3Iq9j6/LcbueKMMnqKxK7Gnd+uzESr/x6t1xSnaVpLJimceDPFbI3xv54j1i2NSb sHuQ2F4tcyn/Jj9ikcUc8HeAKaFJyDj1TRm5c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:message-id:to:cc:subject:from:in-reply-to:references :x-mailer:mime-version:content-type:content-transfer-encoding; b=Rw5widJUFYk/ZBaykrdU7TwlIe1hyxJ3KoemXI4ukQIIXbIOa3+GsQl9Bw2XNEoOxz G6TxtJcYCd+VBPsB7A3gcwz+7Mizka9mU9v+wFTquioCVJf2MCfTeI6Dr6379LZvGrxj OQzw63bBbqesRqhOQSpypVtmiWk38UXY6065M= Received: by 10.101.211.36 with SMTP id n36mr14960561anq.120.1271294404796; Wed, 14 Apr 2010 18:20:04 -0700 (PDT) Received: from localhost (rikad42.riken.jp [134.160.214.42]) by mx.google.com with ESMTPS id 9sm250962yxf.47.2010.04.14.18.20.01 (version=SSLv3 cipher=RC4-MD5); Wed, 14 Apr 2010 18:20:03 -0700 (PDT) Sender: Maho NAKATA Date: Thu, 15 Apr 2010 10:20:00 +0900 (JST) Message-Id: <20100415.102000.645538350615365151.chat95@mac.com> To: amvandemore@gmail.com From: Maho NAKATA In-Reply-To: References: <4BC5F289.7020408@freebsd.org> X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, alan.l.cox@gmail.com, freebsd-stable@freebsd.org, als@modulus.org, avg@freebsd.org Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2010 01:20:06 -0000 Hi Andriy and Adam, I did also the same thing as suggested. my conclusion: on Core i7 920, 2.66GHz, TurboBoost on, HyperThreading off, My result of dgemm GotoBLAS performance was following. *summary of result 36-39GFlops 81-87% of peak performance without pinning 35-40GFlops 78-89% of peak performance with pinning my observation * performance is somewhat unstable like 35GFlops then next calculation is 40GFlops...and flips etc. jittering is observed. * pinning makes performance somewhat stabler, but we don't gain a bit more. Details. First I ran %./dgemm n: 3500 time : 84.431008 or 22.428125 Mflops : 38244.168629 n: 3600 time : 90.162220 or 23.440381 Mflops : 39819.284422 n: 3700 time : 101.427504 or 27.404345 Mflops : 36977.121646 Note: 36-39GFlops 81-87% of peak performance then, pinned to each core like following % procstat -t 1408 PID TID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm - 3 190 run - 1408 100161 dgemm - 2 190 run - 1408 100162 dgemm - 2 190 run - 1408 100163 dgemm - 1 189 run - 1408 100164 dgemm - 0 190 run - 1408 100165 dgemm - 3 189 run - 1408 100166 dgemm - 1 190 run - 1408 100167 dgemm initial thread 0 190 run - % cpuset -t 100160 -l 0 % cpuset -t 100161 -l 0 % cpuset -t 100162 -l 1 % cpuset -t 100163 -l 1 % cpuset -t 100164 -l 2 % cpuset -t 100165 -l 2 % cpuset -t 100166 -l 3 % cpuset -t 100167 -l 3 then, % procstat -t 1408 PID TID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm - 0 191 run - 1408 100161 dgemm - 0 191 run - 1408 100162 dgemm - 1 190 run - 1408 100163 dgemm - 1 190 run - 1408 100164 dgemm - 2 190 run - 1408 100165 dgemm - 2 190 run - 1408 100166 dgemm - 3 190 run - 1408 100167 dgemm initial thread 3 190 run - n: 4000 time : 121.907696 or 31.475052 Mflops : 40677.295630 n: 4100 time : 139.842701 or 38.702532 Mflops : 35624.444587 n: 4200 time : 143.622179 or 36.725949 Mflops : 40356.011158 n: 4300 time : 153.742976 or 39.465752 Mflops : 40301.013511 n: 4400 time : 164.919566 or 42.380653 Mflops : 40208.611317 n: 4500 time : 175.930335 or 45.422572 Mflops : 40132.139469 Thanks From: Adam Vande More Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 12:47:31 -0500 > On Wed, Apr 14, 2010 at 11:51 AM, Andriy Gapon wrote: > >> on 14/04/2010 19:45 Adam Vande More said the following: >> > >> > also if I run cpuset on the dgemm then the utilization is basically at >> > the theoretical max for one core so at least that part is working. >> >> You can also try procstat -t to find out thread IDs and cpuset -t to >> pin the >> threads to the cores. >> > > it gets to around 90% doing that. > > time : 103.617271 or 27.140992 > Mflops : 47172.925449 > n: 4100 > time : 113.910669 or 30.520677 > Mflops : 45174.496186 > n: 4200 > time : 121.880695 or 32.068070 > Mflops : 46217.711013 > n: 4300 > > tried a couple of different thread orders but didn't seem to make a > difference. > > galacticdominator% procstat -t 1922 > PID TID COMM TDNAME CPU PRI STATE WCHAN > 1922 100092 dgemm initial thread 0 190 run - > 1922 100268 dgemm - 1 190 run - > 1922 100270 dgemm - 1 191 run - > 1922 100272 dgemm - 3 190 run - > 1922 100273 dgemm - 2 191 run - > 1922 100274 dgemm - 2 191 run - > 1922 100282 dgemm - 0 190 run - > 1922 100283 dgemm - 3 190 run - > > galacticdominator% cpuset -t 100092 -l 0 > galacticdominator% cpuset -t 100268 -l 1 > galacticdominator% cpuset -t 100270 -l 2 > galacticdominator% cpuset -t 100272 -l 3 > galacticdominator% cpuset -t 100273 -l 0 > galacticdominator% cpuset -t 100274 -l 1 > galacticdominator% cpuset -t 100282 -l 2 > galacticdominator% cpuset -t 100283 -l 3 > > > galacticdominator% cpuset -t 100092 -l 0 > galacticdominator% cpuset -t 100268 -l 0 > galacticdominator% cpuset -t 100270 -l 1 > galacticdominator% cpuset -t 100272 -l 1 > galacticdominator% cpuset -t 100273 -l 2 > galacticdominator% cpuset -t 100274 -l 2 > galacticdominator% cpuset -t 100282 -l 3 > galacticdominator% cpuset -t 100283 -l 3 > > > This is from the second set: > > time : 150.348850 or 40.488350 > Mflops : 45022.951141 > n: 4600 > time : 161.968982 or 43.589618 > Mflops : 44669.884500 > n: 4700 > > Since this is a full fledged desktop environment, 90% utilization seems > pretty good. I'm no expert Andriy, but it seems like if gotoblas > implemented some of the FreeBSD optimizations then we'd be in the same > ballpark. > > > -- > Adam Vande More