From owner-freebsd-threads@FreeBSD.ORG Sun Oct 7 20:09:51 2007 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F279716A418 for ; Sun, 7 Oct 2007 20:09:50 +0000 (UTC) (envelope-from gofdt-freebsd-threads@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 659F813C4A7 for ; Sun, 7 Oct 2007 20:09:49 +0000 (UTC) (envelope-from gofdt-freebsd-threads@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IecRa-0006sD-Du for freebsd-threads@freebsd.org; Sun, 07 Oct 2007 20:09:42 +0000 Received: from 78-1-114-229.adsl.net.t-com.hr ([78.1.114.229]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 07 Oct 2007 20:09:42 +0000 Received: from ivoras by 78-1-114-229.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 07 Oct 2007 20:09:42 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-threads@freebsd.org From: Ivan Voras Date: Sun, 07 Oct 2007 22:09:28 +0200 Lines: 99 Message-ID: References: <200710071805.39399.tijl@ulyssis.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 78-1-114-229.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.0 (X11/20070527) In-Reply-To: <200710071805.39399.tijl@ulyssis.org> Sender: news Subject: Re: Unexpected threading performance result X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Oct 2007 20:09:51 -0000 Tijl Coosemans wrote: > On Sunday 07 October 2007 16:52:03 Ivan Voras wrote: >> For an unrelated purpose, I'm benchmarking performance of tree >> algorithms in SMP environments and my preliminary run has an unexpected >> result. Here's the typical output from the (small) benchmark program, >> run on a dual-core Athlon64 (i386 mode): >> >> Running benchmarks on small_nonuniform, 1000000 samples >> Step 1: Running 100 loops >> ** Step 1 benchmark completed 100 loops in 84.44 seconds. >> Step 2: Running 2 threads with 100 loops each >> ** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds. > > My guess is, that in the beginning of step1() and step2() you have to > add a line "time_start = gettime();". Of course I have. I was so focused on the low level stuff I did something stupid to the effect of your suggestion. Thanks for the help! The results make sense now, and if anyone's interested, I'm pasting them below. I did additional effort and run it under both 4BSD and ULE schedulers non 7-CURRENT (SMP, dual-core). -- 4BSD, nonuniform samples -- Running benchmarks on small_nonuniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 86.33 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 82.79 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 124.67 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 166.32 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 210.67 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 251.83 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 291.25 seconds. -- ULE nonuniform samples -- Running benchmarks on small_nonuniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 84.09 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 83.43 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 126.21 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 166.66 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 209.40 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 250.36 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 291.92 seconds. Step 2: Running 8 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 8 threads in 333.42 seconds. -- 4BSD uniform samples -- Running benchmarks on small_uniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 93.33 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 89.33 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 135.20 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 179.96 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 226.40 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 269.57 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 314.06 seconds. Step 2: Running 8 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 8 threads in 358.67 seconds. -- ULE uniform samples -- Running benchmarks on small_uniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 89.76 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 89.90 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 135.75 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 179.72 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 226.10 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 269.63 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 314.76 seconds. Step 2: Running 8 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 8 threads in 359.44 seconds. "uniform" / "nonuniform" describes the distribution of the random number function.