From owner-freebsd-performance@FreeBSD.ORG Mon Mar 24 14:59:43 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2AA181065670 for ; Mon, 24 Mar 2008 14:59:43 +0000 (UTC) (envelope-from archwndas@yahoo.com) Received: from web56509.mail.re3.yahoo.com (web56509.mail.re3.yahoo.com [66.196.97.38]) by mx1.freebsd.org (Postfix) with SMTP id C643E8FC2D for ; Mon, 24 Mar 2008 14:59:42 +0000 (UTC) (envelope-from archwndas@yahoo.com) Received: (qmail 97265 invoked by uid 60001); 24 Mar 2008 14:33:02 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=T9BerKNbVKjo6KXrpZjYv4n3PiBwaXKmNmNWMG9sIIS3RW+LYPJKTKSLzBh3EZ+7Iz9WVoiWJJ5IhJ/mfRZWEVsOaxomWUB0Z7aeKj5YJO4g7B0qerAcwWeGW8Ogk/l60RezqO6hvAhS6op4W7GZIwfkv48ID4O3f+Wtgq2OVT8=; X-YMail-OSG: FnWZuBIVM1le3NG11lWKjQ3DdiaYHe0ypf2ykrtBBZQkEEExjMr5WoAWr9Igg75eH8sElXHE7r27p1ebuahT1UJ5VQm64CZiVuhMvaL9PscET.vVMi8P.3Q06__XMggBQSYAhuBrRI_Oe.U- Received: from [79.130.152.193] by web56509.mail.re3.yahoo.com via HTTP; Mon, 24 Mar 2008 07:33:01 PDT X-Mailer: YahooMailRC/902.40 YahooMailWebService/0.7.185 Date: Mon, 24 Mar 2008 07:33:01 -0700 (PDT) From: Simeon Nifos To: freebsd-performance@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Message-ID: <147812.96521.qm@web56509.mail.re3.yahoo.com> Subject: run-time performance of regression of sparse matrix vector multiplication X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Mar 2008 14:59:43 -0000 I have found a problem with FreeBSD AMD64 (maybe i386 too). Performance decrease related to Linux. I am attaching the results and the piece of code I used. You have to install g++42 on FreeBSD first. here are the results of the benchmark: =============== ==== LINUX ==== =============== Intel Core 2 ============ number of threads: 1/ 2 Sun CC create : 808/443 multiply: 5063/4488 g++-4.2.2 create : 881/479 multiply: 5245/4691 intel icpc create : 724/404 multiply: 4903/4594 we see that although the allocation of can be safely parallelized the multiplication has a really hard time to do so. Are there any problems with this approach I cannot see? sysctl dev.cpu.0.freq [archwn@home /usr/home/archwn/sparsematrixvector]$ sysctl dev.cpu.0.freq dev.cpu.0.freq: 1654 ===================== ==== FreeBSD 7.0 ==== ===================== Intel Core 2 ============ number of threads: 1/ 2 g++-4.2.2 create : 1750/1288 multiply: 7098/5271 Same optimization flags in both cases with g++-4.2.2. I have also written a pthreads version of the above code which doesn't need OpenMP capable compiler at all. This allows us to try gcc-3.4.6 compiler which is unlikely to have problems of its own. Is there anything you would like me to try out? Is anybody interested in having the code in order to perform his own tests? Thanks in advance, Archwn. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs