Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Apr 2010 08:21:53 +0900 (JST)
From:      Maho NAKATA <chat95@mac.com>
To:        bms@incunabulum.net
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Message-ID:  <20100413.082153.866357745773635148.chat95@mac.com>
In-Reply-To: <4BC2EC9A.2020207@incunabulum.net>
References:  <20100412.131213.4959786962516027.chat95@mac.com> <4BC2EC9A.2020207@incunabulum.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Bruce,

From: Bruce Simpson <bms@incunabulum.net>
Subject: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Date: Mon, 12 Apr 2010 10:49:14 +0100

> So, where's the profiling to discover why this is the case?
Ok I'll provide better documentation so that everyone can test it very clearly.
(may take some time...)

> Also I'm not clear on what constitutes 'theoretical peak performance'
> here or how it is being calculated. So figures like these come across
> as unscientific.

Core i7 920 (2.66GHz) constitutes four cores. each core has four floating point operators.
thus; 2.66GHz x 4 x 4 = 42.56Gflops
cf. http://www.intel.com/support/processors/sb/cs-023143.htm

> I'm sure this is something which can be resolved if someone sits down,
> profiles the app, and makes the necessary adjustments
> (e.g. pthread_setaffinity_np()) to configure CPU affinity, if the lack
> of it is pessimizing your friend's app.
might be. we run on the same machine.

> The PMC framework is rapidly maturing, and you can use KCacheGrind
> with it to visualize context switch overhead.
> 
> But I think it's expecting a bit much to post informal results to
> -stable, in an expectation of something other thaninformal suggestions
> of what may help someone's maths-intensive application.

BLAS is a basic linear algebra package which is used many applications.
It is also used for top500 http://www.top500.org/ 
cf. http://www.top500.org/project/introduction
via LINPACK. dgemm is LEVEL 3 BLAS, which is a very good for common PCs
as calculation is CPU intensive.

> If there are performance issues, then reproducible results are needed,
> as well as some basic profiling effort of the system elements
> involved, before people could say anything either way, or offer
> further help.
again, I'll provide better documentation so that everyone can test it very clearly.
(may take some time...)

thanks,
-- Nakata Maho http://accc.riken.jp/maho/ , http://ja.openoffice.org/ 
   Nakata Maho's PGP public keys: http://accc.riken.jp/maho/maho.pgp.txt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100413.082153.866357745773635148.chat95>