Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Apr 2010 10:20:00 +0900 (JST)
From:      Maho NAKATA <chat95@mac.com>
To:        amvandemore@gmail.com
Cc:        alc@freebsd.org, alan.l.cox@gmail.com, freebsd-stable@freebsd.org, als@modulus.org, avg@freebsd.org
Subject:   Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Message-ID:  <20100415.102000.645538350615365151.chat95@mac.com>
In-Reply-To: <n2o6201873e1004141047t97d89cb0o2688fae1875eae08@mail.gmail.com>
References:  <m2y6201873e1004140945n855c8800we9baced2e293f270@mail.gmail.com> <4BC5F289.7020408@freebsd.org> <n2o6201873e1004141047t97d89cb0o2688fae1875eae08@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Andriy and Adam,

I did also the same thing as suggested. 

my conclusion: on Core i7 920, 2.66GHz, TurboBoost on, HyperThreading off,
My result of dgemm GotoBLAS performance was following.

*summary of result 
36-39GFlops 81-87% of peak performance without pinning
35-40GFlops 78-89% of peak performance with pinning

my observation
* performance is somewhat unstable like 35GFlops then next calculation
is 40GFlops...and flips etc. jittering is observed.
* pinning makes performance somewhat stabler, but we don't gain a bit more.

Details.
First I ran
%./dgemm
n: 3500
time : 84.431008 or 22.428125 
Mflops : 38244.168629
n: 3600
time : 90.162220 or 23.440381 
Mflops : 39819.284422
n: 3700
time : 101.427504 or 27.404345 
Mflops : 36977.121646

Note: 36-39GFlops 81-87% of peak performance

then, pinned to each core like following

% procstat -t 1408
  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN    
 1408 100160 dgemm            -                  3  190 run     -         
 1408 100161 dgemm            -                  2  190 run     -         
 1408 100162 dgemm            -                  2  190 run     -         
 1408 100163 dgemm            -                  1  189 run     -         
 1408 100164 dgemm            -                  0  190 run     -         
 1408 100165 dgemm            -                  3  189 run     -         
 1408 100166 dgemm            -                  1  190 run     -         
 1408 100167 dgemm            initial thread     0  190 run     -  

% cpuset -t 100160 -l 0
% cpuset -t 100161 -l 0
% cpuset -t 100162 -l 1
% cpuset -t 100163 -l 1
% cpuset -t 100164 -l 2
% cpuset -t 100165 -l 2
% cpuset -t 100166 -l 3
% cpuset -t 100167 -l 3
then,
% procstat -t 1408
  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN    
 1408 100160 dgemm            -                  0  191 run     -         
 1408 100161 dgemm            -                  0  191 run     -         
 1408 100162 dgemm            -                  1  190 run     -         
 1408 100163 dgemm            -                  1  190 run     -         
 1408 100164 dgemm            -                  2  190 run     -         
 1408 100165 dgemm            -                  2  190 run     -         
 1408 100166 dgemm            -                  3  190 run     -         
 1408 100167 dgemm            initial thread     3  190 run     -   

n: 4000
time : 121.907696 or 31.475052 
Mflops : 40677.295630
n: 4100
time : 139.842701 or 38.702532 
Mflops : 35624.444587
n: 4200
time : 143.622179 or 36.725949 
Mflops : 40356.011158
n: 4300
time : 153.742976 or 39.465752 
Mflops : 40301.013511
n: 4400
time : 164.919566 or 42.380653 
Mflops : 40208.611317
n: 4500
time : 175.930335 or 45.422572 
Mflops : 40132.139469

Thanks

From: Adam Vande More <amvandemore@gmail.com>
Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Date: Wed, 14 Apr 2010 12:47:31 -0500

> On Wed, Apr 14, 2010 at 11:51 AM, Andriy Gapon <avg@freebsd.org> wrote:
> 
>> on 14/04/2010 19:45 Adam Vande More said the following:
>> >
>> > also if I run cpuset on the dgemm then the utilization is basically at
>> > the theoretical max for one core so at least that part is working.
>>
>> You can also try procstat -t <pid> to find out thread IDs and cpuset -t to
>> pin the
>> threads to the cores.
>>
> 
> it gets to around 90% doing that.
> 
> time : 103.617271 or 27.140992
> Mflops : 47172.925449
> n: 4100
> time : 113.910669 or 30.520677
> Mflops : 45174.496186
> n: 4200
> time : 121.880695 or 32.068070
> Mflops : 46217.711013
> n: 4300
> 
> tried a couple of different thread orders but didn't seem to make a
> difference.
> 
> galacticdominator% procstat -t 1922
>   PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN
>  1922 100092 dgemm            initial thread     0  190 run     -
>  1922 100268 dgemm            -                  1  190 run     -
>  1922 100270 dgemm            -                  1  191 run     -
>  1922 100272 dgemm            -                  3  190 run     -
>  1922 100273 dgemm            -                  2  191 run     -
>  1922 100274 dgemm            -                  2  191 run     -
>  1922 100282 dgemm            -                  0  190 run     -
>  1922 100283 dgemm            -                  3  190 run     -
> 
> galacticdominator% cpuset -t 100092 -l 0
> galacticdominator% cpuset -t 100268 -l 1
> galacticdominator% cpuset -t 100270 -l 2
> galacticdominator% cpuset -t 100272 -l 3
> galacticdominator% cpuset -t 100273 -l 0
> galacticdominator% cpuset -t 100274 -l 1
> galacticdominator% cpuset -t 100282 -l 2
> galacticdominator% cpuset -t 100283 -l 3
> 
> 
> galacticdominator% cpuset -t 100092 -l 0
> galacticdominator% cpuset -t 100268 -l 0
> galacticdominator% cpuset -t 100270 -l 1
> galacticdominator% cpuset -t 100272 -l 1
> galacticdominator% cpuset -t 100273 -l 2
> galacticdominator% cpuset -t 100274 -l 2
> galacticdominator% cpuset -t 100282 -l 3
> galacticdominator% cpuset -t 100283 -l 3
> 
> 
> This is from the second set:
> 
> time : 150.348850 or 40.488350
> Mflops : 45022.951141
> n: 4600
> time : 161.968982 or 43.589618
> Mflops : 44669.884500
> n: 4700
> 
> Since this is a full fledged desktop environment, 90% utilization seems
> pretty good.  I'm no expert Andriy, but it seems like if gotoblas
> implemented some of the FreeBSD optimizations then we'd be in the same
> ballpark.
> 
> 
> -- 
> Adam Vande More



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100415.102000.645538350615365151.chat95>