Date: Tue, 13 Apr 2010 08:28:56 +0900 (JST) From: Maho NAKATA <chat95@mac.com> To: mdpoole@troilus.org Cc: adrian@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Message-ID: <20100413.082856.690091871650385955.chat95@mac.com> In-Reply-To: <87tyrghiio.fsf@troilus.org> References: <t2ud763ac661004120231q44e9a4f7z5c0f11a31725deb@mail.gmail.com> <h2yea2d4a5b1004120658xba353f17w894d33e08558f3ea@mail.gmail.com> <87tyrghiio.fsf@troilus.org>
next in thread | previous in thread | raw e-mail | index | archive | help
From: Michael Poole <mdpoole@troilus.org> Subject: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Mon, 12 Apr 2010 10:06:55 -0400 > Nakata-san's theoretical performance numbers assume 4 to 4.2 operations > per core per cycle at the nominal (2.66 GHz, non-TurboBoost) clock rate. > (DGEMM is double precision, but I am not familiar enough with scientific > computing or with the Nehalem implementation of SSE to know why it is > four operations per cycle rather than two -- is it because double > precision counts as two FLOPs or is it because of multiple issue?) > TurboBoost runs up to 2.93 GHz on this CPU, so it doesn't fit either the > theoretical peak performance or the performance discrepancy very well. Hi Michael, I read http://www.intel.com/support/processors/sb/cs-023143.htm and TurboBoost on 920 is 2.80GHz. > why it is four operations per cycle rather than two It's bit strane to me as well. but I did dgemm operation with m=k=n case and in this case, flop count would become 2n^3 + 2n^2 (even 2n^3 is okay). thanks -- Nakata Maho http://accc.riken.jp/maho/ , http://ja.openoffice.org/ Nakata Maho's PGP public keys: http://accc.riken.jp/maho/maho.pgp.txt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100413.082856.690091871650385955.chat95>